Juraj Hromkovic 




Algorithmics for 
Hard Problems 



Introduction to Combinatorial Optimization, 
Randomization, Approximation, and Heuristics 

2nd Edition 




Springer 



Texts in Theoretical Computer Science 
An EATCS Series 



Editors: W. Brauer G. Rozenberg A. Salomaa 

On behalf of the European Association 
for Theoretical Computer Science (EATCS) 

Advisory Board: G. Ausiello M. Broy C.S. Calude 
S. Even J. Hartmanis N. Jones T. Leighton M. Nivat 
C. Papadimitriou D. Scott 




Springer- Verlag Berlin Heidelberg GmbH 




Juraj Hromkovic 



Algorithmics 
for Hard Problems 

Introduction 

to Combinatorial Optimization, 
Randomization, Approximation, 
and Heuristics 



Second Edition 



With 71 Figures 




Springer 




Author 



Prof. Dr. Juraj Hromkovii 
Swiss Federal Institute of Technology 
Department of Computer Science 
ETH Zurich, ETH Zentrum RZ F2 
8092 Zurich, Switzerland 

Series Editors 

Prof. Dr. Wilfried Brauer 

Institut fur Informatik, Technische Universitat Miinchen 
Boltzmannstr. 3, 85748 Garching bei Miinchen, Germany 
Brauer@informatik.tu-muenchen.de 

Prof. Dr. Grzegorz Rozenberg 

Leiden Institute of Advanced Computer Science 

University of Leiden 

Niels Bohrweg 1, 2333 CA Leiden, The Netherlands 
rozenber@liacs.nl 

Prof. Dr. Arto Salomaa 
Turku Centre for Computer Science 
Lemminkaisenkatu 14 A, 20520 Turku, Finland 
asalomaa@utu.fi 

Corrected printing 2004 

Die Deutsche Bibliothek - CIP-Einheitsaufnahme 

Hromkovi£, Juraj: Algorithmics for hard problems / J. Hromkovi£. - 

(Texts in theoretical computer science) 

ISBN 978-3-642-07909-2 ISBN 978-3-662-05269-3 (eBook) 
DOI 10.1007/978-3-662-05269-3 

ACM Computing Classification (1998): 

F.2, F. 1.2-3, 1.1.2, G.1.2, G.1.6, G.2.1, G.3, 1.2.8 

ISBN 978-3-642-07909-2 



This work is subject to copyright. All rights are reserved, whether the whole or part of 
the material is concerned, specifically the rights of translation, reprinting, reuse of illus- 
trations, recitation, broadcasting, reproduction on microfilm or in any other way, and 
storage in data banks. Duplication of this publication or parts thereof is permitted only 
under the provisions of the German Copyright Law of September 9, 1965, in its current 
version, and permission for use must always be obtained from Springer- Verlag Berlin 
Heidelberg GmbH. Violations are liable for prosecution under the German Copyright Law. 



springeronline.com 

© Springer-Verlag Berlin Heidelberg 2001, 2003, 2004 

Originally published by Springer-Verlag Berlin Heidelberg New York in 2004 

Softcover reprint of the hardcover 2nd edition 2004 

The use of general descriptive names, trademarks, etc. in this publication does not imply, 
even in the absence of a specific statement, that such names are exempt from the relevant 
protective laws and therefore free for general use. 

Illustrations: Ingrid Zame£nikov4 
Cover Design: KunkelLopkaWerbeagentur 

Typesetting: Camera ready by author 

Printed on acid-free paper 45/3142/PS - 5 4 3 2 1 0 




You have been told also that life is darkness, 

and in your weariness you echo what was said by the weary. 



And I say that life is indeed darkness 
save when there is urge, 

And all urge is blind save when there is knowledge, 

And all knowledge is vain save where there is work, 

And all work is empty save when there is love; 

And when you work with love you bind yourself 
to yourself, and to one another, and to God . . . 



Work is love made visible. 



And if you cannot work with love but only with distances, 
it is better that you should leave your work 
and sit at the gate of the temple and take alms of those 
who work with joy. 



Kahlil Gibran 
The Prophet 




Preface to the Second, Enlarged Edition 




The term algorithm is the central notion of computer science and algorithmics 
is one of the few fundamental kernels of theoretical computer science. Recent 
developments confirm this claim. Hardly any other area of theoretical com- 
puter science has been more lively and has achieved comparably deep progress 
and breakthroughs so fascinating (such as the PCP-theorem and efficient al- 
gorithms for primality testing) in recent years. The most exciting development 
happened exactly in the field of algorithmics for hard problems, which is the 
topic of this book. 

The goal of this textbook is to give a transparent, systematic introduction 
to the concepts and to the methods for designing algorithms for hard problems. 
Simplicity is the main educational characteristic of this textbook. All ideas, 
concepts, algorithms, analyses, and proofs are first explained in an informal 
way in order to develop the right intuition, and then carefully specified in 
detail. Following this strategy we preferred to illustrate the algorithm design 
methods using the most transparent examples rather than to present the best, 
but too technical, results. The consequence is that there are sections where 
the first edition of this book does not go deep enough for advanced courses. 

To smooth this drawback in the second edition, we extended the materials 
for some of the topics of central interest - randomized algorithms for primality 
testing and applications of linear programming in the design of approximation 
algorithms. This second edition contains both the Solovay-Strassen algorithm 
and the Miller- Rabin algorithm for primality testing with a selfcontained anal- 
ysis of their behaviour (error probability). In order to give all related details, 
we extended the section about algebra and number theory in an appropriate 
way. To explain the power of the method of relaxation to linear programming, 
we added the concept of LP-duality and the presentation of the primal-dual 
method. As an introduction to this topic we used the Ford- Fulkerson pseudo- 
polynomial-time algorithm for the maximum flow problem, which is in the 
section about pseudo-polynomial-time algorithms. 

In addition to extending some parts of the book, numerous small improve- 
ments and corrections were performed. I am indebted to all those who sent 




VIII Preface to the Second, Enlarged Edition 



me their comments and suggestions. Especially, I would like to thank Dirk 
Bongartz, Hans- Joachim Bockenhauer, David Buttgereit, Thomas Deselaers, 
Bernd Hentschel, Frank Kehren, Thorsten Uthke, Jan van Leeuven, Sebastian 
Seibert, Koichi Wada, Manuel Wahle, Dieter Weckauf, and Frank Wessel who 
carefully read and commented on large parts of this book. Special thanks go 
to Dirk Bongartz and Hans-Joachim Bockenhauer for fruitful discussions on 
new parts of the book and for their valuable suggestions. The expertise of 
our DT^X experts Markus Mohr and Manuel Wahle was very helpful and is 
much appreciated. The excellent cooperation with Ingeborg Mayer and Alfred 
Hofmann from Springer- Verlag is gratefully acknowledged. 

Last but not least I would like to express my deepest thanks to Peter 
Widmayer for encouraging me to make the work on this book a never-ending 
story. 



Aachen, October 2002 Juraj Hromkovic 




Preface 




Algorithmic design, especially for hard problems, is more essential for success 
in solving them than any standard improvement of current computer tech- 
nologies. Because of this, the design of algorithms for solving hard problems 
is the core of current algorithmic research from the theoretical point of view as 
well as from the practical point of view. There are many general textbooks on 
algorithmics, and several specialized books devoted to particular approaches 
such as local search, randomization, approximation algorithms, or heuristics. 
But there is no textbook that focuses on the design of algorithms for hard 
computing tasks, and that systematically explains, combines, and compares 
the main possibilities for attacking hard algorithmic problems. As this topic 
is fundamental for computer science, this book tries to close this gap. 

Another motivation, and probably the main reason for writing this book, 
is connected to education. The considered area has developed very dynami- 
cally in recent years and the research on this topic discovered several profound 
results, new concepts, and new methods. Some of the achieved contributions 
are so fundamental that one can speak about paradigms which should be in- 
cluded in the education of every computer science student. Unfortunately, this 
is very far from reality. This is because these paradigms are not sufficiently 
known in the computer science community, and so they are insufficiently com- 
municated to students and practitioners. The main reason for this unpleasant 
situation is that simple explanations and transparent presentations of the new 
contributions of algorithmics and complexity theory, especially in the area of 
randomized and approximation algorithms, are missing on the level of text- 
books for introductory courses. This is the typical situation when principal 
contributions, whose seeping into the folklore of the particular scientific dis- 
cipline is only a question of time, are still not recognized as paradigms in the 
broad community, and even considered to be too hard and too special for ba- 
sic courses by non-specialists in this area. Our aim is to try to speed up this 
transformation of paradigmatic research results into educational folklore. 

This book should provide a “cheap ticket” to algorithmics for hard prob- 
lems. Cheap does not mean that the matter presented in this introductory 




X 



Preface 



material is not precisely explained in detail and in its context, but that it is 
presented as transparently as possible, and formalized by using mathematics 
that is as simple as possible for this purpose. Thus, the main goal of this book 
can be formulated as the following optimization problem. 

Input: A computer science student or a practitioner 

Constraints: 

• To teach the input the main ideas, concepts, and algorithm design 
techniques (such as pseudo-polynomial-time algorithms, parame- 
terized complexity, local search, branch-and-bound, relaxation to 
linear programming, randomized algorithms, approximation algo- 
rithms, simulated annealing, genetic algorithms, etc.) for solving 
hard problems in a transparent and well-understandable way. 

• To explain the topic on the level of clear, informal ideas as well as 
on the precise formal level, and to be self-contained with respect 
to all mathematics used. 

• To discuss the possibilities to combine different methods in order 
to attack specific hard problems as well as a possible speedup by 
parallelization. 

• To explain methods for theoretical and experimental comparisons 
of different approaches to solving particular problems. 

Costs: The expected time that an input needs to learn the topic of the book 

(particularly, the level of abstractions of mathematics used and the 
hardness of mathematical proofs). 

Objective: Minimization. 

I hope that this book provides a feasible solution to this hard optimization 
problem. To judge the quality (approximation ratio) of the solution provided 
in this book is left to the reader. 

I would like to express my deepest thanks to Hans-Joachim Bockenhauer, 
Erich Valkema, and Koichi Wada for carefully reading the whole manuscript 
and for their numerous comments and suggestions. I am indebted to Ivana 
Cerna, Vladimir Cerny, Alexander Ferrein, Ralf Klasing, Dana Pardubska, 
Hartmut Schmeck, Georg Schnitger, Karol Tauber, Ingo Wegener, and Peter 
Widmayer for interesting discussions or their comments on earlier drafts of 
this book. Special thanks go to Hans Wossner and the team of Springer- 
Verlag for their excellent assistance during the whole process of the manuscript 
preparation. The expertise and helpfulness of our DT^X expert Alexander 
Ferrein was very useful and is much appreciated. 

Last but not least I would like to thank Tanja for her patience with me 
during the work on this book. 



Aachen, March 2001 



Juraj Hromkovic 




Contents 



1 Introduction 1 

2 Elementary Fundamentals 11 

2.1 Introduction 11 

2.2 Fundamentals of Mathematics 13 

2.2.1 Linear Algebra 13 

2.2.2 Combinatorics, Counting, and Graph Theory 30 

2.2.3 Boolean Functions and Formulae 45 

2.2.4 Algebra and Number Theory 54 

2.2.5 Probability Theory 80 

2.3 Fundamentals of Algorithmics 93 

2.3.1 Alphabets, Words, and Languages 93 

2.3.2 Algorithmic Problems 97 

2.3.3 Complexity Theory 114 

2.3.4 Algorithm Design Techniques 134 

3 Deterministic Approaches 149 

3.1 Introduction 149 

3.2 Pseudo-Polynomial-Time Algorithms 152 

3.2.1 Basic Concept 152 

3.2.2 Dynamic Programming and Knapsack Problem 154 

3.2.3 Maximum Flow Problem and Ford-Fulkerson Method 157 

3.2.4 Limits of Applicability 167 

3.3 Parameterized Complexity 169 

3.3.1 Basic Concept 169 

3.3.2 Applicability of Parameterized Complexity 171 

3.3.3 Discussion 174 

3.4 Branch-and-Bound 175 

3.4.1 Basic Concept 175 

3.4.2 Applications for MAX- S AT and TSP 177 

3.4.3 Discussion 183 




XII Contents 



3.5 Lowering Worst Case Complexity of Exponential Algorithms 184 

3.5.1 Basic Concept 184 

3.5.2 Solving 3SAT in Less than 2 n Complexity 185 

3.6 Local Search 189 

3.6.1 Introduction and Basic Concept 189 

3.6.2 Examples of Neighborhoods and 

Kernighan-Lin’s Variable-Depth Search 193 

3.6.3 Tradeoffs Between Solution Quality and Complexity 198 

3.7 Relaxation to Linear Programming 209 

3.7.1 Basic Concept 209 

3.7.2 Expressing Problems as Linear Programming Problems ... 211 

3.7.3 The Simplex Algorithm 218 

3.7.4 Rounding, LP-Duality and Primal-Dual Method 227 

3.8 Bibliographical Remarks 243 

4 Approximation Algorithms 247 

4.1 Introduction 247 

4.2 Fundamentals 248 

4.2.1 Concept of Approximation Algorithms 248 

4.2.2 Classification of Optimization Problems 253 

4.2.3 Stability of Approximation 253 

4.2.4 Dual Approximation Algorithms 258 

4.3 Algorithm Design 260 

4.3.1 Introduction 260 

4.3.2 Cover Problems, Greedy Method, 

and Relaxation to Linear Programming 261 

4.3.3 Maximum Cut Problem and Local Search 269 

4.3.4 Knapsack Problem and PTAS 272 

4.3.5 Traveling Salesperson Problem and 

Stability of Approximation 282 

4.3.6 Bin-Packing, Scheduling, and 

Dual Approximation Algorithms 308 

4.4 Inapproximability 316 

4.4.1 Introduction 316 

4.4.2 Reduction to NP-Hard Problems 317 

4.4.3 Approximation-Preserving Reductions 319 

4.4.4 Probabilistic Proof Checking and Inapproximability 329 

4.5 Bibliographical Remarks 337 

5 Randomized Algorithms 341 

5.1 Introduction 341 

5.2 Classification of Randomized Algorithms and Design Paradigms . . 343 

5.2.1 Fundamentals 343 

5.2.2 Classification of Randomized Algorithms 345 

5.2.3 Paradigms of Design of Randomized Algorithms 359 




Contents XIII 



5.3 Design of Randomized Algorithms 363 

5.3.1 Introduction 363 

5.3.2 Quadratic Residues, Random Sampling, and Las Vegas . . . 364 

5.3.3 Primality Testing, Abundance of Witnesses, 

and One-Sided-Error Monte Carlo 369 

5.3.4 Equivalence Tests, Fingerprinting, and Monte Carlo 385 

5.3.5 Randomized Optimization Algorithms for MlN-CUT 392 

5.3.6 MAX- SAT and Random Rounding 400 

5.3.7 3SAT and Randomized Multistart Local Search 406 

5.4 Derandomization 411 

5.4.1 Fundamental Ideas 411 

5.4.2 Derandomization by the Reduction 

of the Probability Space Size 413 

5.4.3 Probability Space Reduction and Max-E/cSat 418 

5.4.4 Derandomization by the Method 

of Conditional Probabilities 420 

5.4.5 Conditional Probabilities and Satisfiability 422 

5.5 Bibliographical Remarks 426 

6 Heuristics 431 

6.1 Introduction 431 

6.2 Simulated Annealing 433 

6.2.1 Basic Concept 433 

6.2.2 Theory and Experience 437 

6.2.3 Randomized Tabu Search 441 

6.3 Genetic Algorithms 444 

6.3.1 Basic Concept 444 

6.3.2 Adjustment of Free Parameters 452 

6.4 Bibliographical Remarks 457 

7 A Guide to Solving Hard Problems 461 

7.1 Introduction 461 

7.2 Taking over an Algorithmic Task or a Few Words about Money. . . 462 

7.3 Combining Different Concepts and Techniques 463 

7.4 Comparing Different Approaches 466 

7.5 Speedup by Parallelization 468 

7.6 New Technologies 477 

7.6.1 Introduction 477 

7.6.2 DNA Computing 479 

7.6.3 Quantum Computing 486 

7.7 Glossary of Basic Terms 492 

References 503 



Index 



525 




1 



Introduction 




“ The advanced reader who skips parts 

that appear too elementary 

may miss more than 

the reader who skips parts 

that appear too complex. ” 

G. Polya 



Motivation and Aims 

This textbook provides a “cheap ticket” to the design of algorithms for hard 
computing problems, i.e., for problems for which no low-degree polynomial- 
time algorithms 1 are known. It focuses on a systematic presentation of 
the fundamental concepts and algorithm design techniques such as pseudo- 
polynomial-time algorithms, parameterized complexity, branch-and-bound, lo- 
cal search, lowering the worst case complexity of exponential algorithms, dual 
approximation algorithms, stability of approximation, randomization (foiling 
an adversary, abundance of witnesses, fingerprinting, random sampling, ran- 
dom rounding), derandomization, simulated annealing, tabu search, genetic 
algorithms, etc. The presentation of these concepts and techniques starts with 
some fundamental informal ideas that are later consecutively specified in de- 
tail. The algorithms used to illustrate the application of these methods are 
chosen with respect to their simplicity and transparency rather than with re- 
spect to their quality (complexity and reliability) . The methods for the design 
of algorithms are not only presented in a systematic way, they are also com- 
bined, compared, and parallelized in order to produce a practical algorithm 
for the given application. An outlook on possible future technologies such as 
DNA computing and quantum computing is provided, too. 

1 We call attention to the fact that we do not restrict our interpretation of hardness 
to NP-hardness in this book. Problems like primality testing, that are not known 
to be NP-hard (but that are also not known to be polynomial-time solvable) , are 
in the center of our interest, too. 




2 



1 Introduction 



The main motivation to write this textbook is related to education. The 
area of algorithmics for hard problems has developed very dynamically in 
recent years, and the research on this topic discovered several profound results, 
new concepts, and new methods. Some of the achieved contributions are so 
fundamental that one can speak about paradigms which should be broadcasted 
to every computer scientist. The main aim of this textbook is to try to speed 
up the process of the transformation of these fundamental research results 
into educational folklore. 

To the Students and Practitioners 

Welcome. The textbook is primarily addressed to you and its style follows 
this purpose. This book contains a fascinating topic that shows the impor- 
tance of theoretical knowledge for solving practical problems. Several of you 
consider theory to be boring and irrelevant for computer scientists and too 
hard in comparison with courses in applied areas of computer science and 
engineering. Still worse, some of you may view the courses in theory only 
as a troublesome threshold that must be overcome in order to obtain some 
qualification certificates. This book tries to show that the opposite of this 
judgment is true, i.e., that the theory involves exciting ideas that have a di- 
rect, transparent connection with practice, and that they are understandable. 
The realization of this task here is not very difficult because algorithmics, 
in contrast to several other theoretical areas, has a simple, direct relation to 
applications . 2 Moreover, we show in this book that several practical problems 
can be solved only due to some nontrivial theoretical results, and so that 
success in many applications strongly depends on the theoretical know-how 
of the algorithm designers . 3 The most fascinating effect occurs when one can 
jump from millions of years of computer work necessary to execute any naive 
algorithm to the matter of a few seconds due to an involved theoretical con- 
cept. Another crucial fact is that the use of profound theoretical results does 
not necessarily need to be connected with a difficult study of some complex, 
abstract, and mysterious mathematics. Mathematics is used here as a formal 
language and as a tool, but not as a mysterious end in itself. We show here 
that a very simple formal language on a low level of mathematical abstraction 
is often sufficient to clearly formulate the main ideas and to prove useful asser- 
tions. The simplicity is the main educational characteristics of this textbook. 
All ideas, concepts, algorithms, and proofs are first presented in an informal 

2 That is, one does not need to go any long and complicated way in order to show 
the relevance and usefulness of theoretical results in some applications. 

3 A nice representative example is primality testing for which no deterministic 
polynomial-time algorithm is known. This problem is crucial for public-key cryp- 
tography where one needs to generate large primes. Only due to nontrivial results 
and concepts of number theory, probability theory, and algorithmics a practical 
algorithm for this task was developed. 




1 Introduction 



3 



way and then carefully specified in detail. Progressively difficult topics are 
explained in a step-by-step manner. 

At the end of each section we placed an informal summary that once again 
calls attention to the main ideas and results of the section. After reading 
a particular section you should confront your knowledge about it with its 
summary in order to check whether you may have missed some important 
idea. If new terminology is introduced in a section, the new terms are listed 
at the end of this section, too. 

We hope that this textbook provides you with an enjoyable introduction 
to the modern theory of algorithmics, and that you will learn paradigms that, 
in contrast to specific technical knowledge (though useful now, but becomes 
outdated in a few years) , may remain useful for several decades or even become 
the core of future progress. 



To the Teachers 

This book is based on the courses Algorithms and Approximation Algo- 
rithms and Randomized Algorithms held regularly at the Christian- Albrechts- 
University of Kiel and the Technological University (RWTH) Aachen. The 
course Algorithms is a basic course on algorithmics that starts with classi- 
cal algorithm design techniques but in its kernel mainly focuses on solving 
hard problems. Note that hard does not mean only NP-hard here; rather, 
any problem that does not admit any low-degree polynomial-time algorithm 
is considered to be hard. Section 2.3 and Chapter 3 about deterministic ap- 
proaches for designing efficient algorithms and Chapter 6 about heuristics 
are completely contained in this course. Moreover, the fundamental ideas and 
concepts of approximation algorithms (Chapter 4) and randomized algorithms 
(Chapter 5) along with the simplest algorithms illustrating these concepts are 
included in this fundamental course, too. The advanced course Approximation 
Algorithms and Randomized Algorithms is based on Chapters 4 and 5. The 
hardest parts of this course are included in Section 4.4 about inapproxima- 
bility and Section 5.4 about derandomization, and so one can refrain from 
teaching some subparts of these sections, if any. On the other hand, one can 
add some more involved analyses of the behavior of randomized heuristics such 
as simulated annealing and genetic algorithms. Section 7 is devoted to com- 
bining, comparing, and parallelizing algorithms designed by different methods 
and it should be to some extent included in both courses. The fundamentals 
of DNA computing and of quantum computing (Section 7.6) can be presented 
in the form of an outlook on possible future technologies at the end of the 
advanced course. 

We preferred the direct placement of the exercises into the text rather 
than creating sections consisting of exercises only. The exercises presented 
here are strongly connected with the parts in which they are placed, and they 
help to understand or deepen the matters presented directly above them. If 
the solution to an exercise requires some nontrivial new ideas that are not 




4 



1 Introduction 



involved in this book, then the exercise is marked with W. Obviously, this 
mark is relative with respect to the knowledge of the reader. Note that for 
an involved training of the presented topics additional exercises have to be 
formulated. 

The book is self-contained and so all mathematics needed for the design 
and analysis of the presented algorithms can be found in Section 2.2. Read- 
ing of additional literature is necessary only if one wants the deepen her/his 
knowledge in some of the presented or related topics. We call attention to the 
fact that this book is an introduction to algorithmics for hard problems and so 
reading of related literature is highly recommended for everybody who wants 
to become an expert in this field. The pointers to corresponding literature 
sources in the bibliographical remarks at the end of every chapter should be 
helpful for this purpose. 

There are three main educational features of this material. First, we de- 
velop a systematic presentation of the topic. To achieve this we did not only 
revise some known concepts, but we gave names to concepts that were not 
recognized as concepts in this area so far. Secondly, each particular part of 
this book starts with an informal, intuitive explanation, continues with a de- 
tailed formal specification and examples, and finishes again with an informal 
discussion (i.e., one returns back to the starting point and confronts the ini- 
tial aims with their realization). The third feature is simplicity. Since the use 
of mathematics is unavoidable for this topic we tried to use formalisms as 
simple as possible and on a low level of abstraction. Also in choosing the 
algorithms for illustrating the particular concepts and design techniques we 
preferred to show a transparent application of the considered method rather 
than to present the best (but technical) algorithm for a particular problem. 
All technical proofs start with a simple proof idea, which is then carefully 
realized into the smallest details. 

In the Preface we formulated the main goal of this book as an optimization 
problem of learning algorithmics for hard problems in minimal time. Taking 
a teacher instead of a student as the input, one can reformulate this problem 
as minimizing time for preparing a course on this topic. We hope that this 
textbook will not only save you time when preparing such a course, but that 
it will encourage you to give such courses or use some part of this book in 
other courses, even if algorithmics is not the main topic of your interest. Most 
algorithms presented here are jewels of algorithm theory and several of them 
can be successfully used to illustrate paradigmatic ideas and concepts in other 
areas of computer science, too. 



Organization 

This book consists of three parts. The first part is presented in Chapter 2 and it 
contains elementary fundamentals of mathematics and algorithmics as usually 
taught in undergraduate courses. Chapters 3, 4, 5, and 6 are devoted to the 
proper subject of this textbook, i.e., to a systematic presentation of methods 




1 Introduction 



5 



for the design of efficient algorithms for hard problems. The third part is 
covered in Chapter 7 and serves you as a guide in the field of applications 
of the methods presented in the second part. In what follows we give more 
details about the contents of the particular chapters. 

Chapter 2 consists of two main sections, namely Fundamentals of Mathe- 
matics and Fundamentals of Algorithmics . The aim of the first section is to 
make this book self-contained in the sense that all formal concepts and argu- 
ments needed to design and analyze algorithms presented in Chapters 3, 4, 
5, and 6 can be explained in detail. The elementary fundamentals of linear 
algebra, Boolean logic, combinatorics, graph theory, algebra, number theory, 
and probability theory are presented here. The only part that is a little bit 
more difficult is devoted to the proofs of some fundamental results of num- 
ber theory that are crucial for the design of some randomized algorithms in 
Chapter 5. In fact, we assume that the reader is familiar with the topic of 
the section on fundamentals of mathematics. We do not recommend reading 
this section before starting to read the subsequent chapters devoted to the 
central topic. It is better to skip this section and to look up specific results if 
one needs them for the understanding of particular algorithms. Section Fun- 
damentals of Algorithmics explains the fundamental ideas and concepts of 
algorithm and complexity theory and fixes the notation used in this book. For 
an advanced reader it may also be useful to read this part because it presents 
the fundamentals of the philosophy of this book. 

Chapters 3, 4, 5, and 6 are devoted to a systematic presentation of funda- 
mental concepts and algorithm design techniques for solving hard problems. 
Here, we carefully distinguish between the term concept and the term algo- 
rithm design technique. Algorithm design techniques as divide- and-conquer, 
dynamic programming, branch-and-bound, local search, simulated annealing, 
etc., have a well specified structure that even provides a framework for possible 
implementations or parallelization. Concepts such as pseudo-polynomial-time 
algorithms, parameterized complexity, approximation algorithms, randomized 
algorithms, etc., formulate ideas and rough frameworks about how to attack 
hard algorithmic problems. Thus, to realize a concept for attacking hard prob- 
lems in a particular case one needs to apply an algorithm design technique 
(or a combination of different techniques). 

Chapter 3 u Deterministic Approaches” is devoted to deterministic meth- 
ods for solving hard problems. It consists of six basic sections. Three sec- 
tions are devoted to the classical algorithm design techniques for attacking 
hard problems, namely branch-and-bound, local search, and relaxation to lin- 
ear programming, and three sections are devoted to the concepts of pseudo- 
polynomial-time algorithms, parameterized complexity and lowering the worst 
case exponential complexity. All sections are presented in a uniform way. First, 
the corresponding method (concept or algorithm design technique) is pre- 
sented and formally specified. Then, the method is illustrated by designing 
some algorithms for specific hard problems, and the limits of this method are 
discussed. 




6 



1 Introduction 



The concepts of pseudo-polynomial-time algorithms and parameterized 
complexity are based on the idea to partition the set of all input instances 
of a particular hard problem into a set of easy problem instances and a set of 
hard problem instances, and to design an efficient algorithm for the easy in- 
put instances. To illustrate the concept of pseudo-polynomial-time algorithms 
(Section 3.2) we present the well-known dynamic programming algorithm 4 for 
the knapsack problem. We use the concept of strong NP-hardness for proving 
the non-existence of pseudo-polynomial-time algorithms for some hard prob- 
lems. The concept of parameterized complexity (Section 3.3) is presented as a 
generalization of the concept of pseudo-polynomial-time algorithms here. To 
illustrate it we present two algorithms for the vertex cover problem. Strong 
NP-hardness is again used for showing limits to the applicability of this con- 
cept. 

Branch- and-bound (Section 3.4) is a classical algorithm design technique 
used for solving optimization problems and we chose the maximum satisfiabil- 
ity problem and the traveling salesperson problem to illustrate its work and 
properties. The use of branch- and-bound in combination with other concepts 
and techniques (approximation algorithms, relaxation to linear programming, 
heuristics, etc.) is discussed. 

The concept of lowering worst case complexity of exponential algorithms 
(Section 3.5) is based on designing algorithms with an exponential complexity 
in 0(c n ) for some c < 2. Such algorithms may be practical even for large input 
sizes. This concept became successful especially for satisfiability problems and 
we present a simple algorithm from this area. An advanced application of this 
concept is postponed to Chapter 5 about randomized algorithms. 

Section 3.6 presents the basic framework of local search and of Kemighan- 
Lin’s variable depth search. These techniques are used for attacking optimiza- 
tion problems and provide local optima according to the chosen local neigh- 
borhoods. The notions of a polynomial-time searchable neighborhood and of 
an exact neighborhood are introduced in order to study the tradeoffs between 
solution quality and computational complexity of local search. Some input 
instances of the traveling salesperson problem that are pathological for local 
search are presented. Pathological means that the instances have a unique op- 
timal solution and exponentially many second-best local optima with costs of 
an exponential size in the optimal cost for a neighborhood of an exponential 
size. 

Section 3.7 is devoted to the technique of relaxation to linear programming 
that is used for solving optimization problems. The realization of this tech- 
nique is presented here in three steps. The first step (reduction) consists of 
expressing a given instance of an optimization problem as an input instance of 
integer programming. For its illustration we use the minimum weighted vertex 
cover problem and the knapsack problem. The second step (relaxation) con- 
sists of solving the instance of integer programming as an instance of linear 

4 This algorithm is later used to design an approximation algorithm (FPTAS). 




1 Introduction 



7 



programming. A short, transparent presentation of the simplex algorithm is 
given in order to show one way this step can be performed. The third step 
uses the computed optimal solution to the instance of linear programming 
for computing a high-quality feasible solution to the original instance of the 
given optimization problem. The concept of rounding, LP-duality, and the 
primal-dual method are considered here for this purpose. 

The main goals of Chapter 4 “Approximation Algorithms ” are the following: 

(1) To give the fundamentals of the concept of the approximation of optimiza- 
tion problems. 

(2) To present some transparent examples of the design of efficient approxi- 
mation algorithms. 

(3) To show basic methods for proving limits of the applicability of the concept 
of approximation (i.e., methods for proving lower bounds on polynomial- 
time inapproximability) . 

After a short introduction (Section 4.1), Section 4.2 introduces the basic 
concept of polynomial-time approximability. This concept shows one of the 
most fascinating effects occurring in algorithmics. One can jump from a huge 
inevitable amount of physical work to a few seconds work on a PC due to a 
small change in the requirements - instead of an optimal solution one demands 
a solution whose cost differs from the cost of an optimal solution by at most 
e% of the cost of an optimal solution for some e > 0. Besides introducing basic 
terms such as relative error, approximation ratio, approximation algorithm, 
approximation scheme, a classification of optimization problems with respect 
to the quality of their polynomial-time approximability is given. In many 
cases, one can use the standard concepts and algorithm design techniques to 
design approximation algorithms. Section 4.2 specifies two additional specific 
concepts (dual approximation algorithm and stability of approximation) that 
were developed for the design of approximation algorithms. 

Section 4.3 is devoted to the design of particular approximation algorithms. 
Section 4.3.2 presents a simple 2-approximation algorithm for the vertex cover 
problem and then applies the technique of relaxation to linear programming 
in order to obtain the approximation ratio 2 for the weighted generalization of 
this problem, too. Further, the greedy technique is applied to design a In (n)- 
approximation algorithm for the set cover problem. Section 4.3.3 shows that 
a simple local search algorithms provides an at most 2 approximation ratio 
for the maximum cut problem. In Section 4.3.4 we first use a combination 
of the greedy method and an exhaustive search to design a polynomial-time 
approximation scheme for the simple knapsack problem. Using the concept 
of stability of approximation this approximation scheme is extended to work 
for the general knapsack problem. Finally, using the pseudo-polynomial-time 
approximation algorithm for the knapsack problem from Section 3.2 we design 
a fully polynomial-time scheme for the knapsack problem. Section 4.3.5 is de- 
voted to the traveling salesperson problem (TSP), and to the concept of stabil- 
ity of approximation. First, the spanning tree algorithm and the Christofides 




8 



1 Introduction 



algorithm for the metric TSP are presented. Since TSP does not admit any 
polynomial-time approximation algorithm, we use the concept of stability of 
approximation in order to partition the set of all instances of TSP into an 
infinite spectrum of classes with respect to their polynomial-time approxima- 
bility. The approximation ratios of these classes ranges from 1 to infinity. In 
Section 4.3.6 an application of the concept of dual approximation algorithms 
is presented. The design of a dual polynomial-time approximation scheme for 
the bin-packing problem is used to obtain a polynomial-time approximation 
scheme for the makespan scheduling problem. 

Section 4.4 is devoted to methods for proving lower bounds on polynomial- 
time approximability of specific optimization problems. We partition these 
methods into three groups. The first group contains the classical reduction to 
NP-hard decision problems. The second group is based on specific reductions 
(between optimization problems) that preserve the quality of solutions. We 
consider the approximation-preserving reduction and the gap-preserving re- 
duction here. The third method is based on a direct application of the famous 
PCP-Theorem. 

The main goals of Chapter 5 “Randomized Algorithms ” are the following: 

(1) To give the fundamentals of the concept of randomized computation and 
to classify randomized algorithms with respect to their error probability. 

(2) To specify the paradigms of the design of randomized algorithm and to 
show how they can be applied for solving specific problems. 

(3) To explain some derandomization methods for converting randomized al- 
gorithms into deterministic ones. 

After a short introduction (Section 5.1), Section 5.2 presents the funda- 
mentals of randomized computing, and classifies randomized algorithms into 
Las Vegas algorithms and Monte Carlo algorithms. Las Vegas algorithms never 
provide any wrong result, and Monte Carlo ones are further classified with re- 
spect to the size and the character of their error probability. Moreover, the 
paradigms foiling an adversary , abundance of witnesses , fingerprinting , ran- 
dom sampling , relaxation, and random rounding for the design of randomized 
algorithms are specified and discussed. 

Section 5.3 is devoted to applications of paradigms of randomized com- 
putations to the design of concrete randomized algorithms. The technique of 
random sampling is used to design a simple Las Vegas algorithm for the prob- 
lem of finding a quadratic residue in ZZ P for a given prime p in Section 5.3.2. 
Applying the technique of abundance of witnesses we design the well-known 
one-sided-error Monte Carlo algorithms, (namely the Solovay-Strassen algo- 
rithm and the Miller-Rabin algorithm) for primality testing in Section 5.3.3. 
We also provide detailed selfcontained proofs of their correctness there. In 
Section5.3.4 it is shown how the fingerprinting technique can be used to effi- 
ciently decide the equivalence of two polynomials over 2Z V for a prime p and 
the equivalence of two one-time-only branching programs. The concept of ran- 
domized optimization algorithms is illustrated on the minimum cut problem 




1 Introduction 



9 



in Section 5.3.5. The methods of random sampling and relaxation to linear 
programming with random rounding are used to design a randomized approx- 
imation algorithm for the maximum satisfiability problem in Section 5.3.6. 
Randomization, local search, and the concept of lowering the worst case com- 
plexity are combined in order to design a Monte Carlo 0(1.334 n )-algorithm for 
the satisfiability problem of formulas in 3-conjunctive normal form in Section 
5.3.7 

Section 5.4 is devoted to derandomization. The method of the reduction of 
the probability space and the method of conditional probabilities are explained 
and illustrated by derandomizing the randomized approximation algorithms 
of Section 5.3.6 for the maximum satisfiability problem. 

Chapter 6 is devoted to heuristics. Here, a heuristic is considered to be 
a robust technique for the design of randomized algorithms for which one is 
not able to guarantee at once the efficiency and the quality (correctness) of 
the computed solutions, not even with any bounded probability. We focus 
on the presentation of simulated annealing and genetic algorithms here. We 
provide formal descriptions of the schemes of these techniques, and discuss 
their theoretical convergence as well as experimental adjustment of their free 
parameters. Randomized tabu search as a possible generalization of simulated 
annealing is considered, too. 

Chapter 7 provides a guide to solving hard problems. Among others, we 
discuss the possibilities 

• to combine different concepts and techniques in the process of algorithm 
design, 

• to compare algorithms designed by different methods for the same prob- 
lems with respect to their complexity as well as with respect to the solution 
quality, and 

• to speed up designed algorithms by parallelization. 

Besides this, Section 7.6 provides an outlook on DNA computing and quan- 
tum computing as hypothetical future technologies. To illustrate these compu- 
tation modes we present DNA algorithms for the Hamiltonian path problem 
(Adleman’s experiment) and for the 3-colorability problem and a quantum al- 
gorithm for generating truly random bits. Chapter 7 finishes with a dictionary 
of basic terms of algorithmics for hard problems. 




2 



Elementary Fundamentals 




“ One whoose knowledge 
is not constantly increasing 
is not clever at all ” 

Jean Paul 



2.1 Introduction 

We assume that the reader has had undergraduate courses in mathematics and 
algorithmics. Despite this assumption we present all elementary fundamentals 
needed for the rest of this book in this chapter. The main reasons to do this 
are the following ones: 

(i) to make the book completely self-contained in the sense that all arguments 
needed to design and analyze the algorithms presented in the subsequent 
chapters are explained in the book in detail, 

(ii) to explain the mathematical considerations that are essential in the pro- 
cess of algorithm design, 

(iii) to informally explain the fundamental ideas of complexity and algorithm 
theory and to present their mathematical formalization, and 

(iv) to fix the notation in this book. 

We do not recommend reading this whole chapter before starting to read 
the subsequent chapters devoted to the central topic of this book. For every 
at least a little bit experienced reader we recommend skipping the part about 
the mathematics and looking up specific results if one needs them later for a 
particular algorithm design. On the other hand, it is reasonable to read the 
part about the elementary fundamentals of algorithms because it is strongly 
connected with the philosophy of this book and the basic notation is fixed 
there. More detailed recommendations about how to use this chapter are given 
below. 

This chapter is divided into two main parts, namely “Fundamentals of 
Mathematics” and “Fundamentals of Algorithmics”. The part devoted to ele- 




12 



2 Elementary Fundamentals 



mentary mathematics consists of five sections. Section 2.2.1 contains elemen- 
tary fundamentals of linear algebra. It focuses on systems of linear equations, 
matrices, vector spaces, and their geometrical interpretations. The concepts 
and results presented here are used in Section 3.7 only, where the problem 
of linear programming and the simplex method are considered. The reader 
can therefore look at this section when some considerations of Section 3.7 are 
not clear enough. Section 2.2.2 provides elementary fundamentals of combi- 
natorics, counting, and graph theory. Notions such as permutation and com- 
bination are defined and some basic series are presented. Furthermore the 
notations O, !?, and © for the analysis of the asymptotic growth of functions 
are fixed and a version of the Master Theorem for solving recurrences is pre- 
sented. Finally, the basic notions (such as graph, directed graph, multigraph, 
connectivity, Hamiltonian tour, and Eulerian tour) are defined. The content 
of Section 2.2.2 is the base for many parts of the book because it is used for 
the analysis of algorithm complexity as well as for the design of algorithms 
for graph-theoretical problems. Section 2.2.3 is devoted to Boolean functions 
and their representations in the forms of Boolean formulae and branching 
programs. The terminology and basic knowledge presented here are useful 
for investigating satisfiability problems, which belong to the paradigmatic al- 
gorithm problems of main interest. This is the reason why we consider these 
problems in all subsequent chapters about the algorithm design for hard prob- 
lems. Section 2.2.4 differs a little bit from the previous sections. While the pre- 
vious sections are mainly devoted to fixing terminology and presenting some 
elementary knowledge, Section 2.2.4 contains also some nontrivial, important 
results like the Fundamental Theorem of Arithmetics, Prime Number Theo- 
rem, Fermat’s Theorem, and Chinese Remainder Theorem. We included the 
proofs of these results, too, because the design of randomized algorithms for 
problems of algorithmic number theory requires a thorough understanding of 
this topic. If one is not familiar with the contents of this section, it is rec- 
ommended to look at it before reading the corresponding parts of Chapter 5. 
Section 2.2.5 is devoted to the elementary fundamentals of probability theory. 
Here only discrete probability distributions are considered. The main point is 
that we do not only present fundamental notions like probability space, con- 
ditional probability, random variable, and expectation, but we also show their 
relation to the nature and to the analysis of randomized algorithms. Thus, 
this part is a prerequisite for Chapter 5 on randomized algorithms. 

Section 2.3 is devoted to the fundamentals of algorithm and complexity 
theory. The main ideas and concepts connected with the primary topic of this 
book are presented here. It is strongly recommended having at least a short 
look at this part before starting to read the proper parts of algorithm design 
techniques for hard problems in Chapters 3, 4, 5, and 6. Section 2.3 is orga- 
nized as follows. Section 2.3.1 gives the basic terminology of formal language 
theory (such as alphabet, word, language). This is useful for the representa- 
tion of data and for the formal description of algorithmic problems and basic 
concepts of complexity theory. In Section 2.3.2 the formal definitions of all 




2.2 Fundamentals of Mathematics 



13 



algorithmic problems considered in this book are given. There, we concen- 
trate on decision problems and optimization problems. Section 2.3.3 provides 
a short survey on the basic concepts of complexity theory as a theory for clas- 
sification of algorithmic problems according to their computational difficulty. 
The fundamental notions such as complexity measurement, nondeterminism, 
polynomial-time reduction, verifier, and NP-hardness are introduced and dis- 
cussed in the framework of the historical development of theoretical computer 
science. This provides the base for starting the effort to solve hard algorith- 
mic problems in the subsequent chapters. Finally, Section 2.3.4 gives a concise 
overview of the algorithms design techniques (divide-and-conquer, dynamic 
programming, backtracking, local search, greedy algorithms) that are usually 
taught in undergraduate courses on algorithmics. All these techniques are later 
applied to attack particular hard problems. Especially, it is reasonable to read 
this section before reading Section 3, where these techniques are developed or 
combined with other ideas in order to obtain solutions to different problems. 



2.2 Fundamentals of Mathematics 

2.2.1 Linear Algebra 

The aim of this section is to introduce the fundamental notions of linear 
algebra such as linear equations, matrices, vectors, and vector spaces and to 
provide elementary knowledge about them. The terminology introduced here is 
needed for the study of the problem of linear programming and for introducing 
the simplex method in Section 3.4. Vectors are also often used to represent 
data (inputs) of many computing problems and matrices are used to represent 
graphs, directed graphs, and multigraphs in several following sections. 

In what follows, we consider the following fundamental sets: 

TN= {0, 1,2,...} ... the set of all natural numbers, 

ZZ— (0, —1, 1, —2, 2, . . .} ... the set of all integers, 

|m,n G ZZ, n ^ 0} ... the set of rational numbers, 

1R . . . the set of real numbers, 

(a, b)= {x eJR\a < x < b} for all a, b G IR, a < b, 

[a, b]= {x E IR | a < x < b} for all a, b E IR, a < b, 

For every set S, the notation Pot(S) or 2 s denotes the power set of S, i.e., 

Pot{S) = {Q\QCS}. 



An equation of the type 

V = ax, 

expressing the variable y in terms of the variable x for a fixed constant a, is 
called a linear equation. The notion of linearity is connected to the geomet- 
rical interpretation that corresponds to a straight line. 




14 



2 Elementary Fundamentals 



Definition 2.2.1. 1. Let S be a subset of JR such that if a, b £ S, then a+b £ S 
and a • b £ S. The equation 

y = a\X\ + a 2 x 2 H b a n x n , (2.1) 

where a±, . . . , a n are constants from S and x\,x 2 , . . . ,x n are variables over S 
is called a linear equation over S. We say that the equation (2.1) expresses 
y in terms of variables x\,x 2 , . . . ,x n . The variables xi, x 2 , . . . , x n are also 
called the unknowns of the linear equation (2.1). 

For a fixed value of y, a solution to the linear equation (2.1) is a sequence 
$i, 52, . . . , s n of numbers of S such that 

y = a\S\ + a 2 s 2 + • • • -b a n s n . 

For instance 1,1,1 (i.e., x\ — 1, x 2 — 1, x 3 = 1) is a solution to the linear 
equation 

xi + 2x 2 + 3 x 3 = 6 

over 7Z. Another solution is —1, —1, 3. 

Definition 2.2. 1.2. Let S be a subset of JR such that if a, b e S then a • 
b G S and a + b £ S. Let m and n be positive integers. A system of m 
linear equations in n variables (unknowns) xi,...,x n over S (or 
simply a linear system over S) is a set of m linear equations over S, where 
each of these linear equations is expressed in terms of the same variables 
Xi, X2, . . . , x n . In other words , a system of m linear equations over S is 

cliiXi + ai 2 x 2 + • • • + a\ n x n — y\ 

&2i#i + o j22 x 2 + • • • + a 2 n x n = 2/2 



Q j m2X 2 “b * ' ' + tt mn X n — 2/m ? 



( 2 . 2 ) 



where aij are constants from S fori = 1, . . . , n, j = 1, . . . , m, and xi, x 2 , . . . , x n 
are variables (unknowns) over S. For each i £ {1, . . . , m}, the linear equation 



anXi ~b ai 2 X 2 "b * * * ~b QjinXn — 2/i 

is called the ith equation of the linear system (2.2). The system (2.2) of 
linear equations is called homogeneous if y\ — y 2 — • • • = y m = 0. 

For given values 0/2/1, 2 / 2 , • • • , 2/m, ^ solution to the linear system (2.2) is 
a sequence si, 52, . . . , s n of numbers of S such that each equation of the linear 
system (2.2) is satisfied when x\ — si,X2 = s 2 ,...,x n = s n (i.e., such that 
si, s 2 , . . . , s n is a solution for each equation of the linear system (2.2)). 

The system of linear equations over 2Z 




2.2 Fundamentals of Mathematics 



15 



x\ + 2x2 = 10 
2xi — 2x2 = —4 
3xi + 5x2 = 26 

is a system of three linear equations in two variables x\ and X 2 - x\ — 2 and 
X 2 = 4 is a solution to this system. 1 

Note that there are systems of linear equations that do not have any so- 
lution. An example is the following linear system: 

X\ + 2x2 = 10 
X\ — x 2 = -2 
6xi + 10x2 = 40 

In what follows we define vectors and matrices. They provide a very con- 
venient formalism for representing and manipulating linear systems as well as 
many other objects in mathematics and computer science. 

Definition 2. 2. 1.3. Let S C JR be any set satisfying a + b E S and a • b E S 
for all a,b € S. Let n and m be positive integers. An m X n matrix A over 
S is a rectangular array of m-n elements of S arranged in m horizontal rows 
and n vertical columns: 



A — [&ij\i=l,...,m,j=l,...,n 

\^ml ®m2 • • • Q>mn j 

For all i E {l,...,ra}, j E {l,...,n}, is called the (i,j)-e ntry 2 of A. 
The ith row of A is 

{(li \ , < 2^2 > • • • > din) 

for all i E {1, ... , m}. For each j E { 1 , . . . , n}, the jth column of A is 

( a\j ^ 

a 2 j 

a 3 j 

\ &mj ] 

For any positive integers n and m, a 1 x n matrix is called an n- 

dimensional row- vector, and amx 1 matrix is called an m-dimensional 
column- vector . 

1 We assume that the reader is familiar with the method of elimination that ef- 
ficiently finds a solution of linear systems. We do not present the method here, 
because it is not interesting to us from the algorithmic point of view. 

2 Observe that aij lies on the intersection of the zth row and the jth column. 



/ an CL\2 • • • Ciln \ 

I &21 a 22 • • • ^2 n I 




16 



2 Elementary Fundamentals 



For any positive integer n, an n x n matrix is called a square matrix 
of order n. If A — [aij]ij= is a square matrix, then we say that the 
elements an, a 22 , . . . , a nn form the main diagonal of A. 

A is called the 1-diagonal matrix (or the identity matrix,), denoted 
by In, if 

(i) an = 1 for i = 1, . . . , n, and 

(ii) aij = 0 for all i ^ j, i,j E {1, . . . ,n}. 

A is called the O-diagonal matrix if 

(i) an — 0 for all i = 1, . . . , n, and 
(H)aij = 1 fori^j, i,j E {l,...,n}. 

An m x n matrix B = \bij]i=i,...,m,j=i,...,n is called a Boolean matrix if 
bij e {0, 1} for i = 1, . . . , m, j = 1, . . . , n. 

An m x n matrix B — is called a zero matrix if 

= 0 for all i E {1, . . . , m}, j E {1, . . . , n}. The zero matrix of size m x n 
is denoted by 0 mX n* 



The following matrix B = \bij\i=i,...,z,j=i,...^ 



matrix over Q. 



B = 




is an example of a 3 x 4 



(|, 1, —8) is the second row of B. The (3, 4)-entry of B is 634 = 0. 



Definition 2.2. 1.4. Let m,n be two positive integers. Two m x n matrices 
A = [a,ij\ and B = [bij\ are said to be equal if aij = bij for all i E {1, . . . , m}, 

j e {1, . . . , n}. 

The sum of A and B ( A + B ) is the matrix C = [cij\i=i,"^m,j=i,...,n 
defined by* 

Cij — CLij T bij 

for all i E {1, . . . , m}, j E {1, . . . , n}. 



Exercise 2.2. 1.5. Let A , B , C be matrices over IR of same size m x n, m,n E 
IN — {0}. Prove that 

(i) A + B = B + A, 

(ii) A + (B + C) = (A + B) + C, 

(iii) there exists a negative of A (i.e., there exists a m x n matrix D = (—A) 
such that A + D = 0 mX n)- 

□ 



3 Observe that the sum of the matrices is defined only when A and B have the 
same number of rows and the same number of columns. 




2.2 Fundamentals of Mathematics 



17 



Definition 2.2. 1.6. Let m,p,n be positive integers. Let 



A — [&ij\i=l,...,m,j=l,...,p OXld B — [bij\i=l^..^pj=z 

be two matrices. The product (multiplication) of A and B is the m x n 
matrix C = \c%j]i=i,...,mj=i,...,n defined by 



Cij — anbij A ■+■••• + O'ipbpj 

for i = 1, . . . ,m, j — 1, . . . ,n. 



p 

= ^ ^ Q>ikbkj 
k= 1 



To illustrate Definition 2.2. 1.6 consider the matrices 



A = 



1 0 - 2 \ 

0 3-1 y ’ 



B = 



l-3\ 

5 0 . 

0 4/ 



Then, 

4 R = / 1 - 1 + 0 - 5 + (-2) - 0 1 - (-3) + 0-0-1- (-2) ■ 4 \ / 1 -ll\ 

y0 • 1 + 3 • 5 + (-1) -00- (-3) + 3 • 0 + (-1) • 4 J \15 -4 ) ' 

Observe that B A is not defined because the product of B and A is defined 
only if the number of columns of B is equal to the number of rows of A. 

Exercise 2.2. 1.7. Find two squared matrices A and B over Z such that 
AB^BA. □ 

Exercise 2. 2. 1.8. Let A, B, C be m x p, p x q, q x n matrices over IR, respec- 
tively. Prove: 



A • (B • C) = (A • B) • C. D 

Exercise 2.2. 1.9. Prove that for every n x n matrix A, 

A ’ I n — I n • A = A. □ 

Definition 2.2.1.10. Let r be a real number, and let A = [a^] be an m x n 
matrix over IR. The scalar multiple of A by r, r • A, is the m x n matrix 
B = [ bij }, where 

bij — r • aij 

fori = l,...,ra, j = l,...,n. 

Definition 2.2.1.11. Let A = [a^] be an mxn matrix, m, n G IN — {0}. The 
n x m matrix B = where 

a ij = bji fori = 1 = l,...,n 

is called the transpose of A and denoted by A J . If A — A T , then we say that 

A is symmetric. 




18 



2 Elementary Fundamentals 



Exercise 2.2.1.12. Prove, for every real number r and any matrices A and 
B over IR, that the following equalities hold: 

(i) (AT) J = A, 

(ii) (A + B) J = A T + B T , 

(Hi) (A ■ B) t = B T • A T , 

(iv) (rA) T = r • A T . 

□ 

Now we show how matrices can be used to represent systems of linear 
equations. Consider the system (2.2) of Definition 2. 2. 1.2. Define 





( an &i2 • 


• din \ 




( Xl \ 




(yi\ 


A = 


&21 «22 • 


• &2 n 


, x = 


X2 


, Y = 


V2 




\ ®ml ®m2 • • 


■ • Q"mn / 




\ %n / 




\ Um ) 



Then, the linear system (2.2) can be written in the matrix form 

A X = Y. 

The matrix A is called the coefficient matrix of the linear system (2.2). 
For instance, for the system of linear equations 

—x\ + 2x 2 — 3^3 = 7 
6#i + X 2 T xs — 5 



the coefficient matrix is 

A =(~61~ 3 l) ^ X = ’ Y = (l) ■ 

Definition 2.2.1.13. Let A be a squared n x n matrix, n e IN — {0}. A is 
called nonsingular ( or invertible ) if there exists an n x n matrix B such 
that 

A - B = B - A = I n . 

The matrix B is called the inverse of A and denoted 4 by A~ 1 . If there exists 
no inverse of A, then A is called singular (or noninvert ible^). 

One can easily verify that, for 

^=(22) and 5 =("l-i)> 

A-B = B- A = I 2 , and so A~ x = B and B~ x — A. Observe also that I~ x = I n 
for any positive integer n. 



4 Note that if, for a matrix A, there exists a matrix B with the property A ■ B = 
B A = In, then B is the unique inverse of A. 




2.2 Fundamentals of Mathematics 



19 



Exercise 2.2.1.14. Prove the following assertion. If Ai,A 2 , . . . , A r are nxn 
nonsingular matrices, then A\ • A 2 A r is nonsingular, and 

(a 1 -a 2 Ar)- 1 = a; 1 ■ a;^ A^ 1 . □ 

Let A X = Y be a system of linear equations where the coefficient matrix A 
is an n x n nonsingular matrix. Then one can solve this system by constructing 
A~ l because multiplying the equality 

A-X = Y 

by A~ x from the left side one obtains 

A' 1 A X = A~ 1 • Y. 

Since A~ l • A = I n and I n • X = X we obtain 

x = A -1 • y. 

Now we look at the geometrical interpretation of systems of linear equa- 
tions. 

Definition 2.2.1.15. For any positive integer n, we define the n-dimensional 
(IR-) vector space 





/ 


( a\\ 






IR" = < 








ai E IR for i = 1 , . . . , n > 




< 


\ &n j 







The vector 0 nx i is called the origin o/IR n . 



There are two possible geometrical interpretations of the elements of IR n . 
One possibility is to assign to an element 






x = 



\ / 



of IR n the point with the coordinates ai, a 2 , . . . , a n in IR n . Another possibility 
is to assign a directed line from (0, 0, ... , 0) T to (ai, . . . , a n ) T . This directed 
line is called the vector (ai, . . . , a n ) T . 

Consider 1R 2 . To build a geometrical interpretation of IR 2 one starts from 
the origin (0, 0) T . One draws two infinite lines which are orthogonal to each 
other and which intersect at the origin. One of the lines is usually in a hori- 
zontal position and is called the cc-axis. The other infinite line, the y- axis, 
is taken in a vertical position (see Figure 2.1). Then, the positive reals are 




20 



2 Elementary Fundamentals 



placed on the x-axis to the right of the origin in increasing order and the neg- 
ative reals are placed on the x-axis to the left in decreasing order. Similarly, 
the y - axis above the origin contains the positive real numbers and the y - axis 
below the origin contains the negative real numbers. For any point X of the 
plane one can determine the coordinates of X as follows: 

(i) Take a line / that contains the point X and is orthogonal (perpendicular) 
to the x-axis (parallel to the y- axis). The real number a x associated with 
the intersection of the x-axis and l is the x-coordinate of X. 

(ii) Take a line h that contains the point X and is orthogonal to the y - axis 
(parallel to the x-axis). The real number a y associated with the intersec- 
tion of the y - axis and h is the ^/-coordinate of X. 

We shall denote the point X by P(a x ,a y ), and the corresponding vector 
by ( a x ,a y ) T . 




Fig. 2 . 1 . 



Definition 2 . 2 . 1 . 16 . Let P(ai,a,2 ) and P(b \ , 62) be two points in IR 2 . The 
(Euclidean) distance between P(ai, <22) and P(b\,b2) is defined by 

distance (P(ai, 02), P(6i, 6 2 )) = yj {a\ - bi ) 2 + (a 2 - b 2 ) 2 - 

We observe from Figure 2.2 that the Euclidean distance between two points 
is exactly the length of the line that connects P(ai, a 2 ) and P(bi, 62) because 
of the Pythagorean Theorem. 




a i 

a 2 



2.2 Fundamentals of Mathematics 



21 




Fig. 2.2. 

Exercise 2.2.1.17. Prove that, for every three points P(ai,a2), P(&i,&2)> 
and P(ci,C2), 

(i) distance(P(ai,a2),P(ai,a2)) =0, 

(ii) distance(P(ai, 02), P(&i, 62)) = distance(P(bi, 62), P(«i, ^2)), and 

(iii) distance(P(ai 1 a2 ) 1 P(&i, 62)) < distance(P(au a 2 ), P(ci, c 2 )) 

+dzstance(P(ci, C2), P(6i, 62))- 

□ 

We can find a natural interpretation of systems of linear equations of two 
variables in H 2 . One can assign to every linear equation 

aiaq -F CL2X2 = 6 

with (ai,a2) 7^ (0,0) the straight line aq = ^ • £2 if a\ 7^ 0 

and the straight line X2 = ^ if ai = 0. This straight line consists of all points 
of 1R 2 that satisfy the given linear equation. So, the set of all solutions to 
a system of linear equations is exactly the intersection of all straight lines 
corresponding to the particular equations. 

Figure 2.3 contains four lines corresponding to the linear equations x\ + 
3^2 = 8, xi + 3 x 2 = 4, xi + 0 • X2 = 2, and X\ — X2 — 0. The system of linear 
equations 



xi + 3 x 2 = 4 

Xi — X 2 = 0 




22 



2 Elementary Fundamentals 




Fig. 2.3. 



has the unique solution P(l,l) (i.e., x\ = 1, x 2 = 1) that is the intersec- 
tion point of the lines corresponding to these equations. The system of linear 
equations 



x\ + 3x 2 = 4 
xi -f- 3x2 = 8 

does not have any solution because the corresponding lines do not intersect. 5 
On the other hand, the system of linear equations 



xi + 3 x 2 = 4 
2xi -I- 6x2 = 8 

has infinitely many solutions because both linear equalities x\ + 3x2 = 4 and 
2xi + 6x 2 = 8 define the same line, and so every point of this line is a solution. 
Finally, we observe that the system of linear equations 

x\ + 3x 2 = 8 

X\ T 0 • X 2 — 2 
x\— x 2 — 0 

has exactly one solution P(2, 2) (i.e., x\ = 2, x 2 = 2), and that the set of 
linear equations 

xi + 3 x 2 = 4 

xi + 0 • x 2 = 2 

Xi — X2 = 0 

Because they are parallel. 



5 




2.2 Fundamentals of Mathematics 



23 



does not have any solution. 

We see that any nontrivial linear equation of two variables determines a 
line that is a one-dimensional part of 1R 2 . In general, any linear equation over 
n variables determines an (n — l)-dimensional subpart 6 of IR n . To understand 
it geometrically, we present some elementary fundamentals of the theory of 
vector spaces. 

Definition 2.2.1.18. Let W C IR n , n G IN— {0}. We say that W is a (linear ) 
vector subspace of IR n if ’ for all 7*i,7*2 G IR and all (oq, a 2 , . . . , a n ) T , 
(b u b 2 ,...,b n ) J eW, 

n • (oi, a 2 , . . . , a n ) T + r 2 • (h, b 2 , ■ ■ . , b n ) T € W. 

Note that every vector subspace of IR n contains the origin 0 nX i — 
(0, . . . , 0) T , because one can set r\ = 7*2 = 0. 

Let V = {(ui, &2, 0) T | ai, ci 2 £ IR}- We observe that V is a vector subspace 
of IR 3 because 



ri • (bi,b 2 , 0) T + r 2 ■ (di,d 2 ,0) J = (nbi +r 2 di,rib 2 +r 2 d 2 ,0) J € V 
for all real numbers r\ , r 2 , b\ , 62 , d\ , . 

Definition 2.2.1.19. Let A — be a matrix and let X = 

(xi, X 2 , • • . , x n ) T . For every homogeneous linear system AX = 0i X m, we define 
the set of solutions to A • X = 0i xm as 

Sol(A) = {Y G IR n | A • Y = 0 mX i}. 

Analogously, for every A and every b G IR m , the set of solutions to A - X = b 
is 

Sol(A , b) = {Y G IR n | A • Y = 6}. 

Lemma 2.2.1.20. Let A • X = 0 nx i be a system of linear equations where A 
is an m x n matrix, m, n G IN — {0}. The set Sol (A) of all solutions to the 
linear system A • X = 0 nx i is a vector subspace ofJR n . 

Proof X = 0 n x 1 is a solution of every homogeneous system of linear equations 
of n variables; thus, Sol (A) is not empty. Let X\ and X 2 be arbitrary vectors 
from Sol (A ), and let 7*1 and r 2 be arbitrary reals. We have to prove that 
r\Xi + 7 ^X 2 G Sol(A). This can be done by the following simple calculation. 

A(r\X\ + 7 * 2 X 2 ) = AriXi + ^ 7 * 2 X 2 
= 7*1 AXi 4" 7*2-AX2 

= n • 0 nx i + r 2 • 0 nx i = 0 nx i. □ 

6 Later, we shall see it defined by the term “affine subspace” (the term “manifold” 
is used in some literature, too). 




24 



2 Elementary Fundamentals 



The trivial vector subspace of lR n is {0 nx i}. Obviously, there is no 
non-trivial vector subspace W of IR n with a finite cardinality. This is because 
for each X € W, X ^ 0 nX i, the infinite set 

{r • X | r e IR} C W. 

Definition 2.2.1.21. Let X, Xi, X 2 , . . . , Xk be vectors from IR 71 , n, k E IN — 
{0}. The vector X is a linear combination of the vectors Xi, X 2 , . . . , X*. 

if there exist real numbers c \ , C 2 , . . . , Ck such that 

X — C\X\ -f C 2 X 2 4- • • • + c k X k . 

For instance, (4, 2, 10, — 10) T is a linear combination of the vectors 
(1, 2, 1, — 1) T , (1, 0, 2, -3) T , and (1, 1, 0, -2) T because 

(4, 2, 10, -10) T = 2 • (1, 2 , 1 , - 1 ) T + 4 • (1, 0, 2, -3) T - 2 • (1, 1, 0, - 2 ) T . 

Definition 2 . 2 . 1 . 22 . Let S = {Xi, X 2 , . . . , Xk} C IR n be a set of nonzero 
vectors , and let W be a subset ofJR n , k E lN,n E IN — {0}. 

We say that S spans W if every vector from W is a linear combination 
of vectors from S. The trivial vector subspace {0 nX i} C IR n is spanned by the 
empty set S. 

The set S is called linearly dependent if there exist reals ci, C 2 , . . . , c k 
not all zero, such that 

c\X\ + C 2 X 2 -+-••• + c k X k = 0 nX i. 

Otherwise, S is called linearly independent (i.e., the equality c\X 1 +C 2 X 2 + 

h c k X k = 0 nx i holds only forci=c 2 = --- = c k = 0). 

The set S is called a basis of a vector subspace U C IR n , if 

(i) S spans U, and 

(ii) S is linearly independent. 

For any subspace V C IR n we say that the dimension of V is a k E IN if 
there exists a set of vectors S C IR n such that 

(i) |S| = fc, and 

(ii) S' is a basis of V. 

We say also that V is a /c-dimensional subspace of IR n , or that dim(V) = k. 

For instance, (1, 0, 0, 0) T , (0, 1, 0, 0) T , (0, 0, 1, 0) T , (0, 0, 0, 1) T is a basis of 
IR 4 because 

(i) (a, b , c, d) T = a • ( 1 , 0 , 0 , 0 ) T -b b • ( 0 , 1 , 0 , 0 ) T + c • ( 0 , 0 , 1 , 0 ) T + d • ( 0 , 0 , 0 , 1 ) T 
for each vector (a, b , c, d) T E IR 4 , and 

(ii) ci - ( 1 , 0 , 0 , 0 ) t +c 2 -( 0 , 1 , 0 , 0 ) t +c 3 -( 0 , 0 , 1 , 0 ) t +c 4 -( 0 , 0 , 0 , 1 ) T = ( 0 , 0 , 0 , 0 ) T 
if and only if C\ = c 2 = c 3 = C 4 = 0 . 




2.2 Fundamentals of Mathematics 



25 



Let us consider the following linear equation 

a\X\ + a 2 x 2 + 03X3 H h o n £ n = b ( 2 . 3 ) 



of n variables (unknowns) xi,x 2 , . . . , x n . We can also write ( 2 . 3 ) as A • X = 5 , 
where A = (cq, 02, . . . , a n ) and X — (aq, £2, • • • , x n ) T . 

Lemma 2 . 2 . 1 . 23 . For even/ linear homogeneous equation A • X = 0 , where 
A = (ai, . . . ,a„) 7 ^ (0, ... ,0), X = (a;i, . . . ,x„) T , 5 o/(A) is an (n - 1)- 
dimensional subspace ofJR n . 



Proof. Without loss of generality we assume a\ / 0 . Then 

a 2 03 a n 

Xi = x 2 Xs x n 

cq cq cq 

is another form of the linear equation ( 2 . 3 ) for b — 0 . So, Sol(A) = 



02 
01 ' 



" 2/2 



a3 r 

a\ 



-ys 



— 2/n, 2/2, 2/3, - - - , 2/r: 
a 1 



2/2, 2/3, • . . , 2/n € H 



is the set of all solutions of AX = 0 . We verify this assumption by the following 
calculation: 

/ x / 02 03 a n 

(ai,a 2 ,...,o n ) • 2/2 2/3 2 /n, 2 / 2 , 2 / 3 , • • • , Vn 

\ a\ cq cq 

(—022/2 — 031/3 — • • • — a n y n ) + a 2 y 2 + 032/3 + • * * + o n y n = 0. 




Now we claim that 
02 



5 = 



Ol 



1,0, ... ,0 



- — 1 0, 1, 0, ... ,0 

Ol 



Ol 



-, 0 , . . . , 0, 1 



is a basis of Sol(A). S spans Sol(A) because 



( 02 03 a n 

2/2 2/3 2/n, 2/2, 2/3, • • • , 2/n 

\ Ol Ol Ol 



T 



2/2 



..,0 

Ol 



T 

+ 2/3 



03 

Ol ' 



,0,1, ...,0 



T 

+ ' ' ' + 2/n 



Ol 






T 



for all 2/2, 2/3, • • • , 2/n € 1R. 

It remains to be shown that the vectors of S are linearly independent. 
Assume that 



ci 



Ol 



+ c 2 



+ C n _i 



-,o,i,...,o) +••• 

a 1 ) 

= 0 . 

V / 




26 



2 Elementary Fundamentals 



Then, particularly 



ci • 1 = 0 
c 2 • 1 — 0 



C-n—l ’1 — 0 * 

Thus, ci = c 2 = • • • = c n _i = 0. 



□ 




Xi 



If one considers the vector space IR 2 , then the set of solutions of a linear 
equation ci#i + c 2 x 2 = 0 with (ci,c 2 ) ^ (0,0) is the set of all points of the 
corresponding straight line x\ — ^ • x 2 . Examples are presented in Figure 
2.3. In IR 3 the set of all solutions of a linear homogeneous equation is a two- 
dimensional subspace of IR 3 (Fig. 2.4). 

As we have already observed, the set Sol(A) of solutions Y to A Y — 
0 m xi is a subspace of IR n . The question is what is the dimension of Sol (A). 
Obviously, if A is not a zero matrix, the dimension of Sol(A) is at most n — 1. 




2.2 Fundamentals of Mathematics 



27 



Theorem 2.2.1.24. Let U be a subset of]R n , n G IN — {0}. U is a subspace 
ofJR n if and only if there exists a matrix A such that U = Sol(A). 

Proof We have already mentioned that Sol(A) is a subspace of IR n if A is 
an m x n matrix, ra, n G IN — {0}. It remains to be shown that, for every 
subspace U C ]R m , there exists such an A that U = Sol(A). 

Let S = {Si, S 2 , • • • , Sm} be a basis of U, S* = ( 5 * 1 , . . . , s in ) for i = 
1, 2, . . . , m. Now we construct an m x n matrix A = [a^] such that 

A • SjT = 0 mX 1 (2-4) 

for every k = 1, 2, . . . , m. Setting S = we can express (2.4) 

as 

A • S T = 0 mxm . (2.5) 

But (2.5) is equivalent to the requirement 

(a/i, a/ 2 , . . • , a in ) • S T = 0i X m (2.6) 

for every l G {1, . . . , ra}. So the Ith equation of (2.6) can be seen as a system 
of linear equations of n unknowns a/i, a/ 2 , • • • , a/ n . Solving this linear system 
one determines the values an, a/ 2 , . . . , a/ n . Doing it for every Z, the matrix 
A is determined. It remains to be proven that U = Sol (A). Next we prove 
U C Sol (A). The opposite direction Sol (A) C f/ is left to the reader. 

As S is a basis of U, we have, for every X G U, 



X = Cl sj + c 2 Sl + ■ ■ ■ + CmSl, 



for some ci, C 2 , . . . , c m G R. Thus 

A X = A • (ciS^" + C2*S'J + • • • 4- c m *S^) 

= ci • A • iq" + C 2 * A • + • • • + c m • A • 

= Ci • 0 mx i T C 2 * 0 m xi T • * * T Cjji • 0 mx i 

( 2 . 4 ) 

= OmX 1 , 



and so X G Sol (A). □ 

Definition 2.2.1.25. Let A be annxm matrix [a^]i = i v .. 5n j = i v .. ?m . Let Ai — 
(an, a Z 2 , . . . , a^ m ) T for i = 1, . . . , n, and let U be the subspace of IR n that is 
spanned by {^ 1 ,^ 2 , . . . , A n }. We define the rank of the matrix A as 

rank(A) = dim(U). 

Obviously, rank(I n ) = n for every positive integer n. The rank of the 
following matrix 




28 



2 Elementary Fundamentals 



is three. This is because 



M = 



/ 1 2 0 3 \ 
2 110 
10 0 1 
\0 1 -1 4/ 



(0,1, -1,4) = 1 - (1, 2, 0,3) -1.(2, 1,1,0) + 1 * (1, 0,0,1), 



and the fact that the set of vectors {(1, 2, 0, 3), (2, 1, 1, 0), (1, 0, 0, 1)} is linearly 
independent. 

Exercise 2.2.1.26. Let A — n, m G IN — {0}. Let Si — 

(an, a i2 , • * * , a irn ) T for i = 1, . . . , n, and let Cj = (aij,a 2j , . . . , a nj ) T for j = 
1, . . . , m. Let U be the subspace of IR m spanned by {Si, . . . , S n } and V be a 
subspace of IR n that is spanned by {C \, . . . , C m }. Prove that 

dim(U) — dim(V). □ 

Exercise 2.2.1.27. Let B = be an m x n matrix, n,m G 

IN — {0}. Prove that 

dim(B) — n — rank(B). □ 

Definition 2.2.1.28. Let U be a subspace ofJR n , n G IN — {0}, and let C G 
IR n . The vector set 



V = {X G IR n I X = C + y, Y G U} 

is called an affine subspace ofJR n translated by C from U . 

Observation 2.2.1.29. Let U be a subspace of IR n , n G IN — {0}. For each 
C G IR n , 

dim ({X G lR n | X = C + y, Y G U}) = dim(U). 

The following theorem presents the main relation between systems of linear 
equations and affine subspaces. We omit its technical proof because it does 
not contain any idea that would be interesting for an algorithm design in this 
book. 

Theorem 2.2.1.30. Let n be a positive integer, and let U be a subset o/IR n . 
U is an affine sub space of R n iff there exist mGN,m<n, an m x n matrix 
A, and a vector b G IR m such that U = Sol(A,b). 

In IR 3 the set of all solutions of a linear equation is a two-dimensional sub- 
space of IR 3 , called also a plane. Figure 2.4 contains the plane corresponding 
to the linear equation x\ + x 2 + 2^3 = 4. This plane is unambiguously given 
by the three points (4,0,0), (0,4,0), (0,0,2) in which it crosses the axes of 
IR 3 . The line x\ +x 2 = 4 is the intersection of the plane Xs — 0 with the plane 




2.2 Fundamentals of Mathematics 



29 



x\ + X 2 + 2^3 = 4. The line X2 + 2x3 = 4 is the intersection of the plane X\ — 0 
and the plane x\ + X 2 + 2 x 3 = 4, and the line X\ + 2 x 3 = 4 is the intersection 
of the plane X2 = 0 and the plane x\ + X2 + 2x3 = 4. 

A transparent way to work with vector subspaces is to view them as convex 
sets. 

Definition 2.2.1.31. Let X and Y be two points in IR n . A convex combi- 
nation of X and Y is any point 



Z = c-X + (l-c)-Y 



for any real number c, 0 < c < 1. If c £ {0, 1} ; then we say that c is a strict 
convex combination of X and Y . 

Observe that the set 

Convex (X, Y) = {Z G IR n | Z = c • X + (1 — c) • Y for a c G IR, 0 < c < 1} 

is exactly the set of points of the straight line that connects X and Y. 

Definition 2.2.1.32. Let n be a positive integer. A set S C ]R n is convex 
if, for all X,Y G S, S contains all convex combinations of X and Y (i.e., 
Convex{X , Y) C S for all X,Y e S). 

The trivial example of convex sets in lR n are IR n , the empty set, and any 
singleton set. The set of solutions of a linear equation a 1X1 4- ^2^2 = b in IR 2 
is convex. An important property of convex sets is expressed in the following 
assertion. 

Theorem 2.2.1.33. The intersection of any number of convex sets is a convex 
set. 

Proof. Let n ie j Si be the intersection of convex sets Si for i G I. If X, Y G 
f\e/ Si , then X and Y are in every Si . Any convex combination of X and Y 
is then in Si for every i G /, and therefore in C\ ieI Si. □ 

Now we show that the set of solutions of any linear equation 

aixi + a 2 x 2 H b a n x x = b 

of n unknowns xi, . . . , x n is a convex set in IR n . Let Y = (yi , . . . , y n ) T and 

W = (wi, . . . , w n ) T be arbitrary solutions of this linear equation, i.e., 

n n 

(HVi aiWi - 
i = 1 i = 1 



Let 



Z = c-Y + (l~c)-W = (q/i + (1 - c)w 1, . ..,cy n + ( 1 - c)w n ) 




30 



2 Elementary Fundamentals 



be an arbitrary convex combination of Y and W, 0 < c < 1. We prove that Z 
is a solution to the linear equation 

n 

^ ^ & i%i = 

2=1 

too. 



n 

(ai,. . . ,a n ) • Z = ^a*(q/i + (1 - c)wi) 

2=1 

n n 

= c • ^2 a iVi + (! - c) • ^2 diWi 
2=1 2=1 

= c • b + (1 — c) • b = 6. 

Because of the above fact and Theorem 2.2.1.33 we obtain that Sol(A , 6) is a 
convex set for every system of linear equations A • X — b. 

2.2.2 Combinatorics, Counting, and Graph Theory 

The aim of this section is to give the definitions of some fundamental objects 
of combinatorics and graph theory, and to present a few elementary results 
about them. The terms introduced in this section are useful for the analy- 
sis of algorithms as well as for the representation of discrete objects in the 
subsequent chapters. 

More precisely, we first introduce the fundamental categories of combina- 
torics such as permutation and combination and learn to work with them. 
Then we define the O, i?, and O notation for the study of the asymptotic be- 
havior of functions, and present some simple summations of some fundamental 
series, and a simplified version of the Master Theorem for solving some spe- 
cific recurrences. After that we give the fundamental notions of graph theory 
such as graphs, multigraphs, directed graphs, planarity, connectivity, match- 
ing, cut, etc. 

First, we define the basic terms of permutation and combination. The 
starting point is to have a set of objects that are distinguishable from each 
other. 

Definition 2.2.2. 1. Let n be a positive integer. Let S = {ai, U 2 , . . . , a n } be a 
set of n objects (elements). A permutation of n objects ai, . . . , a n is an 
ordered arrangement of the objects of S. 

For instance, if S = {ai, 02 , as}, then there are the following six different 
ways to arrange the three objects ai, a 2 , a%: 

(&1, «2, ^ 3 ), (^1, CL3, ^ 2 ), («2, 01 ,^ 3 ), (fl2, «3, ai), (fl3» «1, ^ 2 )? ( a 3, &2, d\). 

To denote permutations the simple notation (zi, 22 , • • • , z n ) is often used in- 
stead of (ai^di 2 , . . . 




2.2 Fundamentals of Mathematics 



31 



Lemma 2. 2. 2. 2. For every positive integer n, the number nl of different per- 
mutations of n objects is 

n 

n\ = n • (n — 1) • (n — 2) 2 1 = i. 

i= 1 

Proof. The first object in any permutation may be chosen in n different ways 
(i.e., from n different objects). Once the first object has been chosen, the 
second object may be chosen in n — 1 different ways (i.e., from n — 1 remaining 
objects), etc. □ 

By arrangement, we use the notation 0!= 1. 

Definition 2. 2. 2. 3. Let k and n be non-negative integers, k < n. A com- 
bination of k objects from n objects is a selection of k objects without 
regard to order. 

A combination of four objects from {a\, 02, as, a±, a§} is one of the follow- 
ing sets: 



{ai, a2, as, a±}, {a\,a2, as, a 5}, {a\, 02, a±, a 5 }, {a\, as, a±, as}, {02, as, a 4, as}. 

Lemma 2. 2. 2. 4. Let n and k be non-negative integers, k < n. The number 
(fc) °f combinations of k objects from n objects is 

fn\ n • (n — 1) • (n — 2) (n — k + 1) n! 

\k) = k\ = k\ • (n — k)\ ' 

Proof. Similar to the proof of Lemma 2. 2. 2. 2, we have n possibilities for the 
choice of the first element, n— 1 possibilities of the choice of the second element, 
etc. Hence, there are 

n • (n — 1) • (n — 2) (n — k + 1) 

ways of choosing k objects from n objects when order is taken into account. 
But any order of these k elements provides the same set of k elements. So, 

/n\ _ n • (n — 1) • (n — 2) (n — k + 1) 

[k = k\ ’ 



□ 



Observe that (q) = (™) = 1. 

Corollary 2. 2. 2. 5. For all non-negative integers k and n, k < n, 



n 

n — k 




32 



2 Elementary Fundamentals 



Lemma 2. 2. 2. 6. For all positive integers k and n, k < n, 

n\ fn — 1\ fn — 1 
k) \k — 1/ k 

Proof. 

( n ~ lS \ , (n-A = ( n ~ !)! , (»-!)! 

\k — lj \ k ) (k — 1)! • (n — k)\ k\ • (n — k — 1)! 

_ k • (n — 1)! + (n — k) • (n — 1)! 
k! • (n — k)l 

_ n • (n — 1)! n! ^ 

= kl-(n-k)! = k! - (n — k)l = \k )' 



The values (^) are also known as the binomial coefficients from the 
following theorem. 

Theorem 2. 2. 2. 7 (Newton’s Theorem). For every positive integer n, 




Exercise 2. 2. 2. 8 . Prove Newton’s Theorem. □ 

Lemma 2. 2. 2. 9. For every positive integer n, 




Proof. To prove Lemma 2. 2. 2. 9 it is sufficient to set x = 1 in Newton’s Theo- 
rem. Another argument is that, for each k G {0, 1, . . . ,n}, (£) is the number 
of all /e-element subsets of an n-element set. So, Ylk = o GD coun f s the number 
of all subsets of a set of n elements, and every set of n elements has exactly 
2 n different subsets. □ 

Exercise 2.2.2.10. Prove, for every integer n > 3, 




□ 




2.2 Fundamentals of Mathematics 



33 



Next, we fix some fundamental notations concerning elementary functions, 
and look briefly at the asymptotic behavior of functions. 

For any positive real number x, we denote the greatest integer less than 
or equal to x by \_x\ and call it the floor of x. For x £ IR + , the ceiling of 
x , denoted by fx ] , is the least integer greater than or equal to x . We observe 
that 

x — 1 < [xj < x < \x] < x + 1 

for each x £ IR + . 

Let x be a variable, and let d be a positive integer. A polynomial in x 
of degree d is a function p(x) of the form 

d 

p(x ) = ^aix\ 

0 

where the constants ao, ai, . . . , are the coefficients of the polynomial and 
aa 7 ^ 0 . 

( 1 \ U 

We use e to denote lim 1 H — j . Note that e is the base of the natural 
n— kx) y n ) 

logarithm function. 

Exercise 2.2.2.11. Prove that, for all reals x, 



x 2 x 3 

e ' = 1 + I+ a + W + " 



E X 



i = 0 



□ 



The assertion of Exercise 2.2.2.11 implies 

l + x<e x <l-bx^-x 2 

for every x £ [—1, 1]. Note that e = 2.7182 . . ., and one can approximate e 
with an arbitrary precision by using the equation of Exercise 2.2.2.11. 

Exercise 2.2.2.12. Prove that for all real x, 

lim (l + — = e x . 
n— kx) V n / 

□ 

In this book we use log n to denote the binary logarithm log 2 n and 
In n = log e n to denote the natural logarithm. The equalities of the following 
exercise provide the elementary rules for working with logarithmic functions. 

Exercise 2.2.2.13. Prove, for all positive reals a, h, c and n, 

(i) log c («6) = log c {a) + log c (6), 




34 



2 Elementary Fundamentals 



(ii) log c a n = n- log c a , 

(iii) a log »" = n logb and 

(iv) log 6 a = □ 

In algorithmics we work with functions from IN to IN in order to measure 
complexity according to the input size. Here, we are often concerned with 
how the complexity (running time, for instance) increases with the input size 
in the limit as the size of the input increases without bound. In this case 
we say that we are studying the asymptotic efficiency of algorithms. This 
rough characterization of the complexity growth by the order of its growth is 
usually sufficient to determine the threshold on the input size above which the 
algorithm is not applicable because of a too huge complexity. In what follows 
we define the standard asymptotic notation used in algorithmics. 

Definition 2.2.2.14. Let f : IN — ► IR-° be a function. We define 

0(/(n)) = {t : IN — ► IR-° | 3c, no G IN, such that Vn G IN, n > no : 
t(n) < c-/(n)}. 

f2(f(n)) = {^r : IN — > IR-° | 3d, no G IN, such that Vn G IN, n > no : 

g(n) > ^ • /(n)}. 

0(/(n)) = 0(/(n)) n i?(/(n)) 

= {ft, : IN — > IR-° | 3ci,C2,no G IN, such thatVn G IN, n > no : 

— • f(n) < h(n) < c 2 • f(n)}. 

Cl 



If t{n) G 0(/(n)) we say that t does not grow asymptotically faster 
than /. If g(n ) G i?(/(n)) we sm/ that g grows asymptotically at least 
as fast as /. If h(n) G @(/(n)) we sa?/ that h and / are asymptotically 
equivalent. 

Exercise 2.2.2.15. Let p(n) = ao + ain + a 2 n 2 H b a^n d be a polynomial 

in n for some positive integer d and a positive real number a d- Prove that 

p(n) G 0(n d ). 



□ 

In the literature one usually uses the notation t(n) = 0(/(n)), g(n) — 
!?(/(n)), an d h(n) = @(/(n)), respectively, instead of £(n) G 0(/(n)), ^(n) G 
i?(/(n)), and /i(n) G @(/(n)), respectively. 

Exercise 2.2.2.16. Which of the following statements are true? Prove your 
answers. 

(i) 2 n G 0 (2 n+a ) for any positive integer (constant) a. 




2.2 Fundamentals of Mathematics 



35 



(ii) 2 b ' n G 0 (2 n ) for any positive integer (constant) b. 

(iii) log 6 n G 0 (log c n) for all 6, c G IR >1 . 

(iv) (n + 1)! G O(nl). 

( v ) log 2 (n!) G 0(n • logn). 

□ 

In the next part of this section we remind the reader of some elementary 
series and their sums. For any function / : INF — > 1R, one can define 

n 

Surrif (n) = ^ f(i) = /( 1) + /( 2) + • • • + f(n). 

i= 1 

Surrif(n) is called a series of /. In what follows we consider only some fun- 
damental kinds of series. 

Definition 2.2.2.17. Let a, b, and d be some constants. For every function 
f : IN — > IR, defined by f(n) = a + (n — l)-d, Sumf(n ) zs called an arithmetic 
series. For eren/ function h : IN — > IR defined by h(n ) = a • 6 n_1 ? Sumh{n) is 
called a geometric series. 

Lemma 2.2.2.18. Let a and d be some constants. Then 

n / \ v'v /. \ ,\ d • (n — 1) • n 

5wm a+(n _ 1) . d (n) = 2 _^(a + (i - 1) • d) = an + . 

i=l 




Lemma 2.2.2.19. Let a and b be some constants, b G Ht + , b ^ 1. Then 



n 

Suma.^n - 1 (n) = a • 6 l_1 = a • 

*=i 



1 — 6 n 
1-6 ' 




36 



2 Elementary Fundamentals 



Proof. 



Sum a . b n - 1 (n) — Y, a • b l 1 =a(l-h6 + 6 2 H (- b n x ) 

i = 1 

Thus, 

b • 5Wi a . b n-i (n) = a • (6 + 6 2 + (- 6 n ) . 

Subtracting the equality (2.8) from the equality (2.7) we obtain 

(1 — b) • Sum a . b n- 1 (n) = a (1 — b n ) , 
which directly implies the assertion of Lemma 2.2.2.19. 

Exercise 2.2.2.20. Prove that for any b G (0, 1), 



oo 

lim Sum a . b n-i(n) = a • b l ~ l 

n— kx) ^ ' 

2—1 



a • 



(2.7) 

(2.8) 



□ 



□ 

Definition 2.2.2.21. For every positive integer n, the nth harmonic num- 
ber is defined by the series 



tt / \ 111 1 

Har{n) = Vt = 1 + - + - + -- - + -. 

• ■* 1 /A n 



2—1 



First, we observe that Har{n) tends to infinity with growing n. The sim- 
plest way to see it is to partition the term of Har(n) into infinitely many 
groups of cardinality 2 k as follows: 




group 1 group 2 



+ 



1111 

4 + 5 + 6 + 7 



group 3 



+ 



l l j_ j_ i_ j_ i l 

8 + 9 + 10 + H + 12 + 13 + 14 + T5 + ”'' 
' ' 

group 4 



Both terms in group 2 are between | and and so the sum of that group 
is between 2 • \ = \ and 2 • \ = 1. All four terms in group 3 are between | 
and and so their sum is also between 4 • - = ~ and 4 • | = 1. In general, 
for every positive integer k , all 2 /e_1 terms of group k are between 2 -/e and 
2 -fc+i an( j h ence the sum 0 f the terms of group k is between \ — 2 k ~ x • 2~ k 
and 1 = 2 k ~ l • 2 ~ k+l . 

This grouping procedure shows us that if n is in group k , then Har(n) > 
k/2 and Har(n) < k. Thus, 



^ l0g2 n ^ + \< Har ( n ) < l_log 2 n\ + 1. 



2 




2.2 Fundamentals of Mathematics 



37 



Exercise 2.2.2.22. Prove that 

Har(n) = Inn + 0(1). 



□ 



Definition 2.2.2.23. For any sequence do, ai, . . . , d n; S/c=i — dk-i) and 
( a i ~ a i+ 1) are called telescoping series. 

Obviously, 

n 

^ ^ ( a^ afc — i ) — a n do 
k = l 

since each of the terms di, d 2 , . . . , d n _ i is added exactly once and subtracted 
out exactly once. Analogously, 

n— 1 

Yl (ai ~ a»+i) = ai - d n . 

i= 1 

The reason to consider telescoping series is that one can easily simplify a series 
if one recognizes that the series is telescoping. For instance, consider the series 



n— 1 

E 



k=l 



1 

k(k + iy 



Since F(FFT) = £ - FFT we S et 



n— 1 ^ 




1 

Jfc + 1 



1 



1 



n 



Analyzing the complexity of algorithms one often reduces the analysis to 
solving a specific recurrence. The typical recurrences are of the form 

T(n) = a-T (^) +/(n), 

where a and c are positive integers and / is a function from IN to IR + . In what 
follows we give a general solution of this recurrence if /(n) G 0(n). 

Theorem 2.2.2.24 (Master Theorem). Let a, b, and c be positive integers. 
Let 



T( 1) = 0, 

T(n) = a - T ^ + b • n. 



( 0(n) if a < c 
Tin) G < 0(n log n) if a = c 
\o(n lo Sc«) ifc<a. 



Then, 




38 2 Elementary Fundamentals 

Proof. For simplicity, we assume n — c k for some positive integer k. 
T(n) = a-T(J^j+b-n 

ni 

+ b • n 



a-[a-T^+b- 

« 2 ' T (?) + 6 '(r" + ") 
a^m + b-n-^d ) 1 



i = 0 



/(log c n)-l \ 

-M E (!) ■ 



i=0 



Now we distinguish the three cases according to the relation between a and c. 

(1) Let a < c. Then, following Exercise 2.2.2.20 we obtain 



log c n—1 

E 



i = 0 



(§)‘sE(!)‘ 



i = 0 



1 

1 - ^ 
c 



eO(i). 



Thus, T(n) e 0(n). 

(2) Let a = c. Obviously, 



log c n-l . 

E ( 7 ) = log c » € O(logn). 

i= 0 



So, T{n) G 0(n log n). 

(3) Let a > c. Following Lemma 2.2.2.19 we obtain 




In what follows we present the fundamental terminology of graph theory. 

Definition 2.2.2.25. An (undirected) graph G is a pair (V,E), where 

(i) V is a finite set called the set of vertices of G, and 

(ii) E is a subset of {{?/, v} \ v, u G V and v ^ u) called the set of edges of 
G. 

Any element ofV is called a vertex of G, and any element of E is called an 

edge of G. G is called a graph of |V| vertices. 




2.2 Fundamentals of Mathematics 39 

An example of a graph is G' = (V, E), where 

V = {vi,v 2 ,vz,v A ,v*,} and E = {{vi, V 3 }, {vi,w 4 }, {*>i, v 5 }> {^ 2 , {t> 2 , v 5 }}- 

One usually represents a graph as a picture in the plane. The vertices are 
points (or small circles) in the plane and an edge {u, v} is represented as a 
curve connecting the points u and v. The graph G' is depicted in Figure 2.5 
in two different ways. 

A graph G is called planar if there exists such a picture representation 
of G in the plane that the curves representing the edges do not cross in any 
point of the plane. The graph G f defined above is planar because Figure 2.5b 
provides its planar representation. 




(a) 




Fig. 2.5. 



We observe that we have either an edge between two vertices u and v or 
no edge between u and v . So, another suitable representation of a graph of n 
vertices v\, V 2 , . . . , v n is by a symmetric nxn Boolean matrix 
called the adjacency matrix of G, where = 1 iff {vi,Vj} E E. The 
following matrix represents the graph G' depicted in Figure 2.5. 

/ 0 0 1 1 1 \ 

0 00 1 1 
1 0 0 0 0 
11000 
\ 1 1 0 0 0 / 

For any edge {u, v} of a graph G = ( V , E) we say that {u, v} is incident 
to the vertices u and v . Two vertices x and y are adjacent if the edge {#, y} 
belongs to E. The degree of a vertex v of G, degc(v ), is the number of 
edges incident to v. The degree of a graph G = (V,E) is 



deg(G) = ma x{degc(v) | v E V}. 




40 



2 Elementary Fundamentals 



Exercise 2.2.2.26. Prove that for every graph G = ( V , E ), 

53 degc(v ) = 2- |J5|. 



□ 

Exercise 2.2.2.27. How many graphs of n vertices v\, t> 2 , • • • , v n exist? □ 

Definition 2.2.2.28. Let G = (V,E) be a graph. A path in G is a sequence 
of vertices P = v i, ^ 2 , • • • , Vm, Vi £ V for i = 1, . . . , m, such that {vi, Vi+ 1 } E 
E for i = 1, . . . , ra — 1. For i = 1, . . . , m, Vi is a vertex of P, and for 
j = 1, . . . , m — 1, {vj,Vj+ 1 } is an edge of P. The length of P is the number 
of its edges (i.e., m — 1 ). 

A path P = vi , V 2 , . . . , Vm is called simple if either all its vertices are 
distinct (i.e., |{^i, ^ 2 , • • • , Vm}\ = m) or all its vertices but v\ and Vm are 
distinct (i.e., . . . , v m }| — m — 1 and = v m ). 

A path P — v\, V 2 , . . . , Vm is called a cycle if v 1 = v m . A simple cycle is 
a simple path that is a cycle. A simple cycle that contains all vertices of the 
graph G is called a Hamiltonian tour of G. A cycle that contains all edges 
of G is called an Euler ian tour of G. If a graph contains a Hamiltonian 
tour, then it is called Hamiltonian. 

Obviously, any path P = v\, v 2 , . . . , v m can be viewed as a graph ({vi, . . . , 
} 5 {{V1,U2}, {^2,^3},.- • , K-l,«m}}). 

P = vi,V 4 ,V 2 ,V 5 ,vi,V 4 ,V 2 is a path of the graph G ' from Figure 2.5. P 
is not simple. The path V 4 , V 2 ,v^, v\ is a simple cycle. G f does not contain 
any Eulerian tour nor any Hamiltonian tour. 

Exercise 2.2.2.29. Show that if a graph contains a path between two vertices 
u and v , then it contains a simple path between u and v. □ 

For any n € IN, K n = ({ui, . . . ,n„}, {{vi,Vj} \ i,j € {1 / j}) 

is the complete graph of n vertices. Observe that Mk u is the 0 -diagonal 
n x n Boolean matrix (i.e., the matrix whose diagonal elements are all Os and 
non-diagonal elements are all Is.) A graph G — (V,E) is called connected 
if, for all x,y E V, x ^ y, there exists a path between x and y in G. G is 
called bipartite if one can partition V into two sets Vi and V 2 (V\ U V 2 = V, 
VinV 2 — 0) such that E C {{u, v} \ u E V\, v E V 2 }. The graph G f in Figure 2.5 
is a connected, bipartite graph. The second property can be viewed especially 
well in Figure 2.5a with V\ — {^i, V 2 } on the left side and V 2 — {^ 3 , V 4 , ^ 5 } on 
the right side. 

Exercise 2.2.2.30. Prove that for every connected graph G = (V,E), 

\E\ > \V\ — 1. 



□ 




2.2 Fundamentals of Mathematics 



41 



Exercise 2.2.2.31. Prove that a connected graph G contains an Eulerian 
tour if and only if the degree of all vertices of G is even. □ 

Definition 2.2.2.32. A cut of a graph G = (V,E) is a triple (Vi, V 2 ,E r ), 
where 

(i) Vi U V 2 = V , Vi 0, V 2 ^ 0, and Vi n V 2 = 0, and 

(ii) E f = E n {{u,v} \ ueVi,v e V 2 }. 

Obviously, to determine a cut in a given graph G = (V, E) it is sufficient 
to give (Vi, V 2 ) or E f only. For instance, E' — {{t>i, ^ 5 }, {v 2 , ^ 4 }} determines 
the cut ({^ 1 , ^ 3 , ^ 4 }, {^ 2 ? ^ 5 }, £") of the graph G' in Figure 2.5. 

Definition 2.2.2.33. Let G — {V,E) be a graph. A matching of G is any 
subset M of E such that if {x,y} and {u,v} are two different elements of M, 
then {x,y} n {u,v} — 0. A matching M is called a maximal matching if 
for each edge {r, s} G E — M , M U {r, s} is not a matching. 

For instance, {{vi,v 3 }}, {{t>i, v 3 }, {u 2 , v 5 }}, and {{vi,^}, {*>2, M} are 
matchings of G f in Figure 2.5. The last two are maximal matchings in G' . 



vi 




Definition 2.2.2.34. A graph G = (V, E) is called acyclic if it does not 
contain any cycle. An acyclic , connected graph is called a tree. A rooted 
tree T is a tree in which one of the vertices is distinguished from the others. 
This distinguished vertex is called the root of the tree. 

Any vertex u different from the root is called a leaf (external vertex) of 
the rooted tree T if degr(u) — 1. A vertex v of T with degr(v) > 1 is called 
an internal vertex. 

The tree T = {{vi,v 2 ,v 3 ,v i ,v b ,v 6 ,v 7 ,v fi ,v< j },{{vi,v 2 }, {vi,u 6 }, {^ 2 ,^ 3 }, 
{^ 2 ,^ 4 }, { i-’3 , i’5 } , { i'6 , i’7 }A v 7> v h}- { v 7- ' , -’ 9 } } ) is depicted in Figure 2.6. If 'Ui is 
considered to be the root of T, then and vg are the leaves of T. If vg 

is taken as the root of T, then ^ 4 ,^ 5 , and vg are the leaves of this rooted tree. 

If a tree T is a rooted tree with a root w , then we usually denote the tree 
T by T w . 




42 



2 Elementary Fundamentals 



Definition 2.2.2.35. A multigraph G is a pair ( V , 7 f) w/iere 

(i) V is a finite set called the set of vertices of G, and 

(ii) H is a multiset of elements from {{^, v} \ u,v e V,u ^ v} called the set 

of multiedges of G. 

An example of a multigraph is G\ = (V,iJ), where V = {fi, v 2 , v 3 , V4}, 
H = {{^1, V2}, {^1, V2}, {^1, ^2}, {^2, ^3}, {^1, ^3}, {^i, ^3}, {^3, ^4}}- a possi- 
ble picture representation is given in Figure 2 . 7 . 




A multigraph G of n vertices vi, ^2, • . . , v n can be represented by a sym- 
metric n x n matrix Mq = [&ij]i,i=i,...,n where bij is the number of edges 
between Vi and Vj. Mq is called the adjacency matrix of G. 

The following matrix is the adjacency matrix of the multigraph G\ in 
Figure 2 . 7 . 



/0 3 2 0 \ 

3 0 10 
2 10 1 
\00 1 0 / 

All notions such as degree, path, cycle, connectivity, matching can be de- 
fined for multigraphs in the same way as for graphs, and so we omit their 
formal definitions. For any two graphs or multigraphs G\ — (Vi , jE7i ) and 
G2 = (Vi, i?2)‘we say that G\ is a subgraph of G2 if V\ C V2 and E± CE 2 . 

In what follows we define the directed graphs and some fundamental no- 
tions connected with them. Informally, a directed graph differs from a graph 
in having a direction (from one vertex to another vertex) on every edge. 

Definition 2.2.2.36. A directed graph G is a pair (V,E), where 

(i) V is a finite set called the set of vertices of G, and 

(ii) E C (V x V) — {(v, v) | v £ V} = {(tx, v)\u ^ v,u,v G V} is the set of 

(directed) edges ofG. 

If (u, v) G E then we say that (u, v) leaves the vertex u and that (u, v) 
enters the vertex v. We also say that (u,v) is incident to the vertices u and 

v. 




2.2 Fundamentals of Mathematics 



43 



The directed graph G 2 = ({vi, v 2 , v 3 , ^4, ^5, ^ 6 }, {(^1,^2), (^2, Vi), (^i, ^3), 
{v 2 ,v 4 ), (v 2 ,v 5 ), (v 4 ,v 2 ), (v 4 ,v 5 ), (v 5 ,v 3 )}) is depicted in Figure 2 . 8 . The edge 
(v 2 ,v 4 ) leaves the vertex v 2 and enters the vertex v 4 . 




Fig. 2.8. 



Again, we can use the notion of an adjacency matrix to represent a di- 
rected graph. For any directed graph G = (V, E ) of n vertices vi , . . . , v n , the 
adjacency matrix of G, Mg— [ c zjkj=i,...,n, is defined by 

f 1 if (vi,Vj) e E 

Cij — \ 0 if (vi,Vj)tE. 

The following matrix is the adjacency matrix of the directed graph G 2 
depicted in Figure 2 . 8 . 

/0 1 1 0 0 0\ 

100110 
_ 000000 
M ° 2 ~ 0 10 0 10 
001000 
\000000/ 

Exercise 2 . 2 . 2 . 37 . How many directed graphs of n vertices v 2 , . . . , v n ex- 
ist? □ 

Let G = (V,E) be a directed graph, and let v be a vertex of G. The 
indegree of v , indegciv ), is the number of edges of G entering v (i.e. , 
indegc{v) = \E D (V x (v})|). The outdegree of v , outdegc(v ), is the 
number of edges of G leaving v (i.e., outdegc{v ) = |J 57 n ({v} x V)\). For 
instance, indegc 2 {v 5) = 2 , outdegc 2 (v 5) = 1 , and outdegc 2 (v 2) = 3 . The 
degree of v of G is 



degG(v) = indegciv) + outdegc{v). 

For instance, degc 2 {v 2) = 5 and degc(v 6 ) = 0 . A vertex u with degc(u) = 0 is 
called an isolated vertex of G. The degree of a directed graph G — (V, E ) 
is 




44 



2 Elementary Fundamentals 



deg(G ) = max{de^Gr(^) | v E V}. 

Definition 2.2.2.38. Let G = ( V , i£) 6e a directed graph. A (directed) path 

in G is a sequence of vertices P = v\, V 2 , . . . , v t £ V for i = 1, . . . , m, 
such that (vi,Vi+i) E for i = 1, . . . , ra — 1. PEe say that P is a path from v\ 
to v m . The length of P is m — 1, i.e., the number of its edges. 

A directed path P — v i, . . . , v m 25 ca^ed simple 2 / either \{v \, . . . , v m }| = 
rri or (vi = 2 ; m? and |{vi, . . . ,v m _i}| = m— 1). A path P = vi, . . . , 25 called 

a cycle if v\ =v m . A cycle vi, . . . ,v m is simple if . . . , v m -i}\ = m — 1. 
A directed graph G is called acyclic if it does not contain any cycle. 

P = vi, V 2 , vi, vs is a directed path in the directed graph G 2 in Figure 2.8. 
Vi, V 2 , v\ is a simple cycle. A directed graph is strongly connected if for all 
vertices 12 , v G V, u ^ v, there are directed paths from u to v and from v to u 
in G. The graph G 2 is not strongly connected. 

Numerous real situations can be represented as graphs. Sometimes we need 
a more powerful representation formalism called weighted graphs. 

Definition 2.2.2.39. A weighted [directed] graph G is a triple (V, E, 
weight ), where 

(i) (V,E) is a [directed] graph, and 

(ii) weight is a function from E to Q + . 

The adjacency matrix of a weighted graph G — (V, E, weight) is Mg — 
\v\> where aij — weight ({vi,Vj}) if {vi,Vj} E E and aij — 0 if 
{ Vi,vj} £ E. 

V5 

v 6 

v 7 7 V2 




The weighted graph 



G = ({v 1 ,v 2l v 3 ,v 4 },{{vi,Vj} | i,j G 4}, 2 ^ j}, weight), 




2.2 Fundamentals of Mathematics 



45 



where weight({v 1 ,^ 2 }) = 7, weight({v 1,^3}) — 1, weight({v\, U 4 }) = 2, 
^ 3 }) = 1, weight({v 2 ,^ 4 }) = 2, and weight({v 3 ,^ 4 }) = 4, is the 
graph depicted in Figure 2.9. Its adjacency matrix is the following symmetric 
matrix: 

/0 7 1 2\ 

70 12 
1104 ‘ 

\2240/ 

Observe that G contains a Hamiltonian tour vi, vs, V 2 , v\. 

Keywords introduced in Section 2.2.2 

permutation, combination, binomial coefficients, polynomial, asymptotic growth of 
functions, arithmetic series, geometric series, harmonic numbers, telescoping series, 
graph, adjacency matrix of a graph, Hamiltonian tour, Eulerian tour, bipartite graph, 
cut, matching, tree, multigraph, directed graph, weighted graph 

2.2.3 Boolean Functions and Formulae 

The aim of this section is to give elementary fundamentals of Boolean logic and 
to present some basic representations of Boolean functions such as Boolean for- 
mulae and branching programs. For formulae we consider some special normal 
forms such as disjunctive normal form (DNF) and conjunctive normal form 
(CNF). 

Boolean logic is a fundamental mathematical system built around the two 
values “TRUE” and “FALSE”. The values TRUE and FALSE are called 
Boolean values. In what follows we use the value 1 for TRUE and the value 
0 for FALSE. 

One manipulates Boolean values with logical (Boolean) operations. The 
fundamental ones are negation, conjunction, disjunction, exclusive or, equiv- 
alence, and implication. Negation is an unary operation denoted by -1 and 
defined by -i(0) = 1 and -i(l) = 0. Sometimes we use the notation 0 and 1, 
respectively, instead of -i(0) and ->(1), respectively. Conjunction is a binary 
operation denoted by the symbol A, and it corresponds to the logical AND , 
i.e., the result is 1 if and only if both arguments are Is. Disjunction is a 
binary operation designated with the symbol V, and it corresponds to the 
logical OR , i.e., the result is 1 if and only if at least one of the arguments is 
1. Corresponding to their logical meaning, they are defined as follows: 

0 AO = 0 0 VO = 0 

0 A 1 = 0 0V1 = 1 

1 AO = 0 1 V 0 = 1 

1 A1 = 1 1 VI = 1 




46 



2 Elementary Fundamentals 



Exclusive or is a binary operation designated by the symbol ®, and 
implication is designated by the symbol =>. Equivalence is designated by 
They are defined as follows: 



0©0 = 0 
0©1 = 1 
1 © 0-1 
101-0 



0=»0 = 1 
0 =* 1 - 1 
1=>0 = 0 
1 =» 1 - 1 



0<^0- 1 
0<^1 = 0 
14*0-0 
1 4 * 1 = 1 



So, the result of the exclusive or is 1 if either but not both of its operands 
are 1. The result of the implication is 0 if and only if the first argument is 1 
and the second argument is 0, because the logical TRUE must not imply the 
logical FALSE. 



Exercise 2. 2. 3.1. Determine which of the above Boolean operations are as- 
sociative and commutative. □ 



Exercise 2. 2 . 3. 2 . Prove that for any Boolean value a and /?, 

(i) ctV/? = ->(->(a) A -.(/?)), 

(ii) a A /3 = --(-.(a)V -.(/?)), 

(iii) a 0/3 — -i(a 4* /?), 

(iv) a =* (3 — -i (a) V /?, and 

(v) a 4* /? — (a =* (3) A (/? =* a:). 

□ 



Definition 2. 2. 3. 3. A Boolean variable is any symbol to which one can 
associate either of the values 0 or 1 . 

Let X — {# 1 , . . . ,x n } be a set of Boolean variables for some n G IN. A 
Boolean function over X is any mapping f from {0, l} n to {0,1}. One 
denotes f by f(x i,£C 2 ? • • • if one wants to call attention to the names 

of its variables. 

Every argument a G {0,l} n of / can be also viewed as a mapping a : 
AsT — ^ {0, 1} that assigns a Boolean value to every variable x G X. Because of 
this we call a an input assignment of /. 

The simplest possibility to represent a Boolean function of n variables is 
to list the values of the function for all 2 n possible arguments (input assign- 
ments). Figure 2.10 presents the representation of a Boolean function / of 
three variables Xi,# 2 , and X3. 

Definition 2. 2. 3. 4. Let f(x i,...,x n ) be a Boolean function over a set of 
Boolean variables X — {#i, . . . , x n }. For every argument a — (ai, 02 , • . . , ot n ) 
G {0, l} n (input assignment a with a(xi) — a* for i — 1, 2, . . . , n) such that 
f(a) — f(a 1 , c*2, • • • , ot-n) = 1 we say that a satisfies /. 



N 1 (f) = {ae{ 0 ,l} n \f(a) = l} 




2.2 Fundamentals of Mathematics 



47 



Xl 


X2 X$\ 


f(x 1 ,X 2 ,X 3 ) 


0 


0 


0 


1 


0 


0 


1 


0 


0 


1 


0 


0 


0 


1 


1 


1 


1 


0 


0 


1 


1 


0 


1 


0 


1 


1 


0 


1 


1 


1 


1 


0 



Fig. 2.10. 



is the set of all input assignments satisfying f . 

N°(f) = {/? e {o, l}" | /(/?) = 0} 

is the set of all input assignments that do not satisfy f. 

We say that f is satisfiable if there exists an input assignment satisfying 
f (i.e., |ATi(/)| > l). 

The function f{x i,X2,X3) in Figure 2.10 is satisfiable. The input assign- 
ment a : {xi,X 2,^3} — > {0, 1} with a(x\) — a(x 2 ) = a(xs) = 0 (the argument 
(0,0,0) of /) satisfies f(x i,#2,£ 3 ). 



N 1 (f(x 1 ,x 2 ,x 3 )) = {(0,0,0), (0, 1, 1), (1,0,0), (1, 1,0)}, and 
N°(f( x u x 2 ,x 3 )) = {(0, 0, 1), (0, 1, 0), (1,0, 1), (1, 1, 1)}. 

Thus, we see that another possibility to represent a Boolean function / is 
to give N x (f) or N°(f). 

Definition 2. 2. 3. 5. Let X = {xi,...,x n } be a set of n Boolean variables, 
n E IN — {0}. Let f(x i, . . . ,x n ) be a Boolean function over X. We say that 

f(x i, . . . , x n ) essentially depends on the variable Xi iff there exist n — 1 
Boolean values 

Oi, 02, • • • , Oj_i, Qi+i, • • • , ot n E {0, 1} 

such that 

f (t^l , OL 2 5 • • • 5 Qt-i— 1 1 0, OLi-\- \ i • • • 5 f 1 5 1 • • • 1 1 ? 1 ? ^i+ 1 5 • • • ? &n) • 

If f(x i,...,x n ) does not essentially depend on a variable Xj for some j E 
{1, . . . ,n}, then we say the Boolean variable Xj is dummy for /. 

For instance, the Boolean function f(x 1,2:2, £3) from Figure 2.10 essen- 
tially depends on xi because by fixing X 2 = 1 and £3 = 0 



/(0, 1, 0) = 0 ^ 1 = /(l, 1, 0). 




48 



2 Elementary Fundamentals 



Exercise 2. 2. 3. 6. Determine whether the Boolean function f(x i, X 2 , x 3 ) from 
Figure 2.10 depends essentially on X2 and X3. □ 

Exercise 2. 2. 3. 7. Prove that every Boolean function / with |A^ 1 (/)| = 1 
essentially depends on all its input variables. □ 

The most usual way to describe Boolean functions is the representation 
via Boolean formulae. 

Definition 2. 2. 3. 8. Let X be a countable set of Boolean variables and let S be 
a set of unary and binary Boolean operations. The class of Boolean formulae 
over X and S (formulae for short) is defined recursively as follows: 

(i) The Boolean values 0 and 1 are Boolean formulae. 

(ii) For every Boolean variable x E X , x is a Boolean formula. 

(Hi) If F is a Boolean formula and ip is a unary Boolean operation from S, 
then p(F) is a Boolean formula. 

(iv) If F\ and F2 are Boolean formulae, and A G S is a binary operation, then 
(Fi AF2) is a Boolean formula. 

(v) Only the expressions constructed by using (i), (ii), (Hi), and (iv) are 
Boolean formulae over X and S. 

An example of a Boolean formula over the set of Boolean variables X = 
{x\, X2, x 3 , . . .} and S — {V, A, 4=>, =>} is 

F = (((xi V x 2 ) A x 7 ) => (x 2 O x 3 )). 

Since the operations V, A, ®, and <=> are commutative and associative, we may 
sometimes omit the parentheses. Thus, for instance, the following expressions 
( (x 1 V x 2 ) V (x 3 V x 4 ) ) , ( ( (x 1 V x 2 ) V x 3 ) V x 4 ) , and x 1 V x 2 V x 3 V x 4 represent the 
same Boolean formula. We shall also use VlLi Xi [A?=i^©r=i Xi] instead of 
X\ V X 2 v • • • V x n [xi A x 2 A • • • A x n , X\ ® x 2 ® • • • 0 x n \. 

Definition 2. 2. 3. 9. Let S be a set of unary and binary Boolean operations. 
Let X be a set of Boolean variables, and let F be a formula over X and 
S. Let a be an input assignment to X. The value of F under the input 
assignment <x, F(ct), is the Boolean value defined as follows: 

/M x ( 0 if F = 0 

(i) F(a) ~ ji { f F = 1 

(ii) F(a) = a(x) if F — x for an x E X, 

(Hi) F(a) = ip(Fi(a)) if F = <p(Fi) for some unary Boolean operation p G S, 
(iv) F(a) = Fi(a) AF2(a) for some binary A e S if F = (F1AF2) for some 
formulae F\ and F2 . 

Let f be a Boolean function over X. If, for each input assignment (3 from 
X to {0,1}, /(/?) = F (P), then we say F represents f. Two formulae F\ 
and F2 are equivalent, F\ — F2, if they represent the same Boolean function 
(i.e., Fi(a) = ^2(0) for every a). 




2.2 Fundamentals of Mathematics 



49 



Obviously, every formula represents exactly one Boolean function. But 
one Boolean function can be represented by infinitely many formulae. For 
instance, the formulae x\ V £2, A -1(2:2)), (#1 V #2 V x\ V £2) A 1 , 

and ->(xi => £2) V ->(£2 = 4 > £1) V -i(£2 => _l (^i)) represent the same Boolean 
function. 

In what follows, we shall use the following notation. Let a G { 0 , 1 }, and 
let £ be a Boolean variable. 



x 



at 



— 1(£) if a = 0 
£ if a = 1. 



A clause over {£1, . . . , x n } is a formula x?f V x V • • • V x?™ for any 
(ai, . . . , a m ) G { 0 , l} m , and any {ii, z 2 , ... , i m } C { 1 , 2 ,... ,n}. Note that m 
may be different from n. For instance, x\ V£3, £1 V£2 V £3 \/x\ \Jx 2 are clauses 
over {£1, £2, £3}- 



Exercise 2 . 2 . 3 . 10 . Prove the following equivalences between formulae: 

(i) £ V 0 = £, £ A 1 = £, £®0 = £, £ ® 1 = £, 

(ii) £ A (y ® z) — (x A y) 0 (x A z), 
x A (y V z) = (x A ?/) V (x A 2:), 

£ V (y A 2) = (£ V y) A (£ V 2), 

(iii) £V£ = £A£ = £V(£ A y) = x A(xW y) — x, 

(iv) £V£ = £®£ = 1, and 

(v) £ A £ = £ ® £ = 0 . 

□ 



Definition 2 . 2 . 3 . 11 . Let n be a positive integer, and let X = {£1, . . . , x n } be 
a set of Boolean variables. For every a — (aq, 02, . . . , a n ) G { 0 , l} n we define 

the minterm over X according to a as the Boolean formula 

minterm cc (x 1 , . . . , x n ) — x™ 1 A x% 2 A • • • A £^ n , 

and the maxterm over X according to ot as the clause 

maxterm^x^ . . . , x^) = xf^ 1 ^ V £3 ' V • • • V x^ an K 

For instance, minterm^ i,o)(^i, ^2, ^3) = £? A £3 A £3 = x\ A £ 2 A £3 and 
ma£^erm( 0 ,i,o)(^i5 ^2, ^3) = x\ V £° V £3 = x\ V £2 V £3. 

Observation 2 . 2 . 3 . 12 . For each a G { 0 , l} n , minterm a (x 1, . . . ,£ n ) takes the 
Boolean value 1 iff x\ = ol{ for i — 1 , . . . ,n (i.e., N 1 (minterm a (xi , . . . ,£ n )) 
= {a}), and maxterm a (x 1, . . . ,£ n ) takes the Boolean value 0 iff Xi = a* for 
i — 1, . . . , n (i.e., N° (maxterm a (x 1, . . . , £ n )) = {a}). 

Proof Obviously, for any a — (a 1, 02, . . . , a n ) G { 0 , l} n , 

minterm a (a \, . . . ,a n ) = oq 1 A c^ 2 A • • • A a% n = 1 




50 



2 Elementary Fundamentals 



because f3 (3 = l for every Boolean value (3. Since (3“ = 0 for all Boolean values 
(3 7^ to, 

minterm ot {f3 \ , . . . , (3 n ) — a f 1 A a^ 2 A • • • A a^ n = 0 

for all vectors (ai, a 2 , . . . , a n ) ^ (/?i, /fe> • • • > fin)- 
Similarly, for any 7 = (71 , 72 , ... , 7n) G {0, l} n , 

maxterm 7 (~fi,j 2 , • • • , 7n) = 7i 1 V 7^ V • • • V 7^" = 0 

because 1° = 0 1 = 0 and so [}& = 0 for each (i € {0,1}. Let (71, • ■ ■ , 7n), 
(Pi,...,Pn) e {0, 1}” and (71 , . . . , 7n) 7 t (/?i , • • • , /?«)• Then there exists i e 
(1, . . . , n} such that /?* ^ 7* (i.e., /% = 7,). Thus, 

maxterm 7 (/3i, V • • • V /3^** = /?7‘ = Pi' = 1- 

□ 

Using the notion of minterm and maxterm one can assign some unique 
formula in a special normal form to every Boolean function. 

Theorem 2.2.3.13. Let f : {0, l} n -+ {0,1}, n G IN - {0}, be a Boolean 
function over X — {nq, . . . , x n }. Then 

f(x i,...,x n )= \f minterm a (x 1 ,...,x n )= \J (x " 1 A • • • A x“") . 

aeN 1 (f) aeN 1 ^) 

Proof Let 0 = (/?i, . . . , (3 n ) G {0, l} n be a vector such that /(/?) = 1, i.e., 
/? G -AT 1 (/). Then mintermp((3 1 , . . . ,/3 n ) = 1 and so 

\J minterm a (x \, . . . , x n ) — 1 . 
oteN^U) 

If ^ = (q 1? . . . , q n ) g {0, l} n is a vector such that /( 7) = 0, then 7 ^ X 1 (f). 
Following Observation 2.2.3.12 we have minterm <*(7) = 0 for each a G IV^/), 
since 0^7. So, 

minterm Q (71 , . . . , 7n) = 0. 

a€N 1 (f) 

□ 

The formula VaeJVH/) ( x i 1 A * " A x n n ) is called the complete disjunc- 
tive normal form, or complete DNF, of /. The complete disjunctive nor- 
mal form is unique for every Boolean function / because it is unambiguously 
determined by A/' 1 (/). 

The complete DNF of the Boolean function f(x 1, x 2 , x 3 ) from Figure 2.10 
is 

(xi A x 2 A x 3 ) V (xi A £ 2 A £3) V (aq A x 2 A x 3 ) V (aq A x 2 A x 3 ), 

because TV^/) - {(0, 0, 0), (0, 1, 1), (1, 0, 0), (1, 1, 0)}. 

The next assertion can be proved in an analogous way to how Theo- 
rem 2.2.3.13 was proved. 




2.2 Fundamentals of Mathematics 



51 



Theorem 2.2.3.14. Let f : {0, l} n — > {0, 1}, n G IN — {0} a Boolean 
function over X = {xi, . . . , x n }. T/ien 

/(xi, . . . ,a; n ) = f\ maxterm a (xi , . . . , x n ) = /\ (a:" 1 V • • • V a;“ n ) . 

a€N°(f) a€N°U) 

Exercise 2.2.3.15. Prove Theorem 2.2.3.14. □ 

The formula f\ aeN o (x^ 1 V • • • V x£ n ) is called the complete conjunc- 
tive normal form, or complete CNF, of /. The complete CNF of the 
Boolean function f(x i,X 2 ,x 3 ) from Figure 2.10 is 

(xi V X 2 V x 3 ) A (xi V x 2 V x 3 ) A (xi Vx 2 V x 3 ) A (xi V x 2 V x 3 ) 

because JV°(/) = {(0, 0, 1), (0, 1, 0), (1, 0, 1), (1, 1, 1)}. 

Definition 2.2.3.16. Let X — {xi, # 2 , x 3 , . . .} 5e a set of Boolean variables. 
LetlC = {x\x e X} = {xi,X 2 ,x 3 , . . .}. T literal (over X) is any element 
from X U X. 

Any Boolean formula (over X) consisting of a conjunction of clauses is 
called to be in conjunctive normal form, CNF. 

For any clause F = x°(( V x°(f V • • • V xf ™ , (an, 0 : 2 , • • • , a n ) G {0, l} n , the 
size of F is n, i.e., the number of its literals. For every positive integer k , a 
formula <P is in ^-conjunctive normal form, fcCNF, if is in CNF and 
every clause of ^ has a size of at most k. 

The formula 



(xi V X 2 ) A (xi V x 3 V x 5 ) A X 2 A (X 4 V x 5 ) 

is in 3CNF. Every complete CNF of Boolean functions of n variables is in 
nCNF. 

We observe that Theorem 2.2.3.13 (as well as Theorem 2.2.3.14) implies 
that every Boolean function can be expressed as a formula over the set of 
Boolean operations {V, A, -i}. 

Exercise 2.2.3.17. Prove that every Boolean function can be represented by 
a formula over 

(i) hv}, 

(ii) {“s A}, and 

(iii) {A, ©}. □ 



Exercise 2.2.3.18. Find a binary Boolean operation (Boolean function of 
two variables) <^, such that every Boolean function can be represented by a 
formula over {( p }. □ 




52 



2 Elementary Fundamentals 



Branching programs are currently the standard formalism for the computer 
representation of Boolean functions. This is because this kind of representation 
is often more concise than the representations by the value table or formulae, 
and that some special versions of branching programs can be conveniently 
handled in several application areas. 

Definition 2.2.3.19. Let X = {xi, . . . , x n }, n G INF — {0}, be a set of Boolean 
variables. A branching program (BP ) over X is a directed acyclic graph 
G = (V, E ) with the following labeling and properties: 

(i) There is exactly one vertex with indegree 0 in G, and this vertex is called 
the source ( or the start vertex ) of the branching program. 

(ii) Every vertex of a nonzero outdegree in G is labeled by a Boolean variable 
from X. 

(Hi) There are exactly two vertices with outdegree equal to 0. These vertices are 
called the sinks (or output vertices,) of the branching program. One of 
the sinks is labeled by 0 and the other one by 1. 

(iv) Every vertex v of a nonzero outdegree has the outdegree equal to two. One 
of the two edges leaving v is labeled by 1 and the other one is labeled by 0. 

For any input assignment a : X — > {0,1}, a branching program A over X 
computes the Boolean value A(a) in the following way: 

(i) A starts the computation on a in its source. 

(n) If A is in a vertex labeled by a Boolean variable x G X , then A moves via 
the edge labeled by a(x) to the next vertex. 

(Hi) If A reaches a sink , then A(a) is the label of that sink. 

Let /(x i, . . . , x n ) : {0, l} n — ► {0, 1} be a Boolean function. We say that a 
BP A represents (or computes,) f if A(f3) = f(/3) for every input assign- 
ment j3 : {xi, . . . , x n } — > {0, 1}. 

For instance, the branching program A in Figure 2.11a follows the compu- 
tation path 

o o o o 
x\ — >x 2 — >xi — >x 3 — >1 

for the input (0, 0, 0) and so A(0, 0, 0) = 1. For the inputs (1, 1, 0) and (1, 0, 0) 
A uses the same computation 



1 0 i 

xi — >x 3 — >1. 

One can easily observe, that A computes the Boolean function f(x i,x 2 ,x 3 ) 
given in Figure 2.10. 

Every branching program unambiguously determines the Boolean function 
that it represents. On the other hand, one can represent a Boolean function 
by different branching programs. The Boolean function /(x i,x 2 ,x 3 ) in Fig- 
ure 2.10 is represented by both branching programs from Figure 2.11. 

Since the general branching programs are not so easy to handle, one usually 
uses some restricted normal forms of them. We shall consider only one such 




2.2 Fundamentals of Mathematics 



53 





Fig. 2.11. 



form where the restriction says that every Boolean variable may be asked for 
its value at most once in any computation of the branching program. 

Definition 2.2.3.20. Let X be a set of Boolean variables, and let A be a 
branching program over X . We say that A is a one-time-only branching 
program if for every directed path P of A all vertices on P have pairwise 
different labels. 

We observe that the branching program in Figure 2.11a is not a one-time- 
only branching program because it asks the Boolean variable x\ twice on the 
path 

0 0 0 
Xi >X2 >Xi >Xs. 

On the other hand, the branching program in Figure 2.11b is a one-time-only 
one. 

Exercise 2.2.3.21. Construct 

(a) a branching program with the minimal number of vertices for the Boolean 
function f(x i,X2,X3) from Figure 2.10. 




54 



2 Elementary Fundamentals 



(b) a minimal (according to the number of vertices) one-time-only branching 
program representing the function f(x 1^X2, xs) from Figure 2.10. 

□ 

Exercise 2.2.3.22. Construct a (one-time-only) branching program for the 
Boolean function 

X\ © x 2 0 © • • • © x n . □ 



Keywords introduced in Section 2.2.3 

Boolean values, Boolean operations, negation, conjunction, disjunction, implication, 
exclusive or, equivalence, Boolean variable, Boolean function, input assignment, 
satisfiability, Boolean formula, clause, minterm, maxterm, complete DNF, complete 
CNF, literal, conjunctive normal form (CNF), &CNF, branching program, one- 
time-only branching program 



2.2.4 Algebra and Number Theory 

The goal of this section is to recapitulate the definitions of some basic algebraic 
structures such as groups, semigroups, rings, and fields, and some fundamen- 
tal results of number theory such as the Fundamental Theorem of Arithmetic, 
Prime Number Theorem, Fermat’s Theorem, and Chinese Remainder Theo- 
rem. The proofs of all these results, except for Prime Number Theorem, can 
be expressed by elementary mathematics and so we present them. The only 
algorithmic part of this section is devoted to the presentation of Euclid’s al- 
gorithm for the greatest common divisor. In fact, we are mainly interested 
in the algebraic structure ZZ n with the set of elements {0, 1, 2, . . . , n — 1} 
and the operations of multiplication and addition modulo n. An important 
observation is the fact that ZZ n is a field if and only if n is a prime. This 
and the fundamental theorems mentioned above are very useful for designing 
efficient randomized algorithms for algorithmic problems from number theory 
for which no polynomial-time deterministic algorithm is known. These are the 
most famous and transparent examples certificating the power and the useful- 
ness of randomization in algorithmics. The knowledge of this section is crucial 
only for some parts of Chapter 5 on randomized algorithms, and so the reader 
may skip this section for now and look at it immediately before reading the 
related sections of Chapter 5. 

A fundamental mathematical structure, also called algebraic structure or 
algebra for short, is a pair (S, F), where 

(i) S is a set of elements, and 

(ii) F is a set of functions that map arguments from S to an element of 5. 

More precisely, F is a set of operations on S, i.e., for every / G F 

there exists an integer m, such that / is a function from S m into S. If 

/ : S m — > S, then we say that / is an m-ary operation on S. 




2.2 Fundamentals of Mathematics 



55 



We are not interested in structures with a mapping that for arguments 
from S produces an element outside S. For our purposes we shall consider only 
the following fundamental sets of elements (numbers) and some fundamental 
binary operations corresponding to addition and multiplication: 

IN = {0, 1, 2, . . .} the set of all non- negative integers 

2Z the set of all integers 

IN + the set of all positive integers 

JN- k — IN — {0, 1, . . . , k — 1} for every positive integer k 

={ 0 , 1 , 2 , 

Q the set of rational numbers 

Q + the set of positive rational numbers 

IR the set of real numbers 

IR + the set of positive real numbers 

IR- 0 the set of non-negative real numbers 

If F = {/i, . . . , fk} is a finite set of operations of an algebra (S', F), we 
simply write (S, /i, / 2 , . . • , fk) instead of (S, {/i, / 2 , • • • , fk}) in what follows. 
For some binary operations / and g we use the notation • and + if / should 
be interpreted as a “version” of multiplication and if g should be interpreted 
as a “version” of addition 7 in the algebra considered. Thus, instead of f(x,y ) 
[g(x, y )] we simply write x • y [x + y]. 

Definition 2.2.4. 1. A group is an algebra (S, *), where 

(i) * is a binary operation, 

(ii) * is associative, i.e., Vx, y,z G S : (x * y) * z = x * (y * z), 

(Hi) there exists an e G S (called the neutral element according to * in 
S ), such that Vx E S : 

e*x = x = x*e. 

(iv) Vx G S there exists an element i{x) G S such that 

i(x) * x = e = x * i(x), 

where i(x) is called the inverse element of x according to *. 

A group is said to be commutative if x*y = y*x for all x,y G S. 

In what follows, if * is considered to be multiplication, then the neutral 
element is denoted by 1. If * is considered to be addition, then the neutral 
element is denoted by 0. 

Example 2. 2. 4. 2. [7Z, +) with + as the standard addition of (7Z, +) is a 
commutative group. The neutral element is 0, and i(x) = — x for every x G Th. 

□ 

7 Note that this interpretation does not necessarily mean that the considered op- 
erations • and + are commutative. 




56 2 Elementary Fundamentals 

Exercise 2 . 2 . 4. 3. Prove that we cannot build a group on IN by using the 
standard addition + on IN. □ 

As already mentioned above we are mainly interested in ZZ n = { 0 , 1 , , n— 
1}. To define a “natural” addition and multiplication on 2Z n one needs the no- 
tion of “divisibility” and “remainder of the division”. Let a, d be non-negative 
integers, and let A; be a positive integer. If a = k • d, then we say that d di- 
vides a or that a is a multiple of d, and write d\a. By agreement, every 
positive integer divides 0 . Obviously, if a and d are positive integers and d\a, 
then d < a. 

If d\a we also say that d is a divisor of a. For every positive integer a the 
integers 1 and a are so-called trivial divisors of a. Nontrivial divisors of a 
are called factors of a. For instance, the factors of 24 are 2, 3, 4, 6 , 8 , 12. 

The definition of the operations on Z n is based on the elementary division 
of a positive integer a by a positive integer b for b < a. Using the school 
algorithm for division one can write 



a — q-b + r, (2.9) 

where q is the result of the division procedure and r < b is the remainder of 
the division. One can easily prove that for every integer a and every positive 
integer 5, there are unique integers q and r such that the equality (2.9) is 
true. In what follows we use the notation a div b for q and a mod b for r. 
For instance, 21 div 5 = 4 and 21 mod 5 = 1. For every positive integer n, we 
define the operations ® n and 0 n on 2Z n as follows: 

For all x, y G 7Z n : 



x © n y = (x + y) mod n, 
x © n y = (x -y) mod n. 

For instance, 7 ©13 10 = (7 + 10) mod 13 = 17 mod 13 = 4, 6 0 n 7 = 
(6 • 7) mod 11 = 42 mod 11 = 9. 

Since the remainder of the division by n is smaller than n, x(B n y, x(D n y £ 
2Z n for all x,y G 2Z n . So, 0 n and © n are binary operations on 7Z n . 

Example 2. 2.4.4. Consider 2Z n = {0, 1, . . . , n — 1} and the operation © n as 
defined above: 

Then 0 can be considered as the neutral element according to © n . For 
every x G 2Z n , define i(x) = (n — x) mod n. We observe that, for every x, 

x © n i(x) — (x + (n — x) mod n) mod n — n mod n = 0 . 

Since © n is associative and commutative, (^ n ,© n ) with the neutral element 
0 is a commutative group. □ 



The crucial point is that if one has a structure (algebra) like a group 
(S',®), where there exists an inverse element i(x) according to addition for 




2.2 Fundamentals of Mathematics 



57 



every x E 5, then one also has the subtraction in this structure. 8 One can 
define it as 

x — y = x + i(y). 

In what follows we shall also use the notation — y for i(y ), where i(y) is the 
inverse of y according to addition. 

A similar situation appears if one considers a group (S', •), where • stands 
for multiplication. If i(x) is the inverse of x according to *, then one also has 
division in the group. Division is defined by 

x/y = X‘i(y ), 

and we often use the notation y~ x instead of i(y). 

If one considers the set 2Z and the standard multiplication operation one 
can easily observe that it is impossible to build a group. The neutral element 
according to multiplication is 1 and for no r E 7Z, x ^ 1, are we able to 
find an element y such that x • y = 1. Thus, we do not have inverse elements 
according to the multiplication in 2Z, and we cannot define division in such a 
structure. 

In what follows we also use the abbreviation a z inductively defined for 
every a E S and every i E 7Z in a group (S, *) with the neutral element e as 
follows: 

(i) a 0 = e, a 1 = a, and a~ x — i(a), where a 0 is called the trivial power of 

a; 

(ii) a l+l — a * a 1 for every positive integer i, and a 1 is called a nontrivial 
power of a if i > 1; 

(iii) a - - 7 = {i(a)y for every positive integer j. 

An element g E S is called a generator of a group (5, *) if S = {g l \ i E 
Zj}. If a group has a generator, then the group is called cyclic. For instance, 
(i5y n ,®) is a cyclic group for every n E IN + because V = i for every i E 
{1, 2, . . . , n — 1}, and 1° = 0, i.e., 1 is a generator of {ZZ ni ® n ). Observe that 
2 is a generator of (2Z n , ® n ) if and only if n is an odd integer. 

Definition 2. 2.4. 5. A semigroup is any algebra (5,*), where * is an asso- 
ciative , binary operation on S. 

A monoid is an algebra (M, *) where 

(i) * is a binary associative operation, and 

(ii) there exists an e E S such that Mx ^ S \ e ^ x — x — x ^ e. 

A monoid is commutative if\/x,y E M , x*y — y * x. 

8 Observe that subtraction considered as a function from S 2 to S is not associative. 
This is the reason why one does not use the notion “operation” for it. The same 
holds for division. 




58 



2 Elementary Fundamentals 



Example 2. 2. 4. 6. (IN, +) with the neutral element 0, and (2Z, •) with the 
neutral element 1 are commutative monoids. (X 1 *, •), where U is an alphabet 
(a finite set of symbols), i7* is the set of all finite sequences over elements of 
E, and • is the concatenation of two sequences is a (noncommutative) monoid. 
The neutral element of (E*, •) is the empty sequence A. □ 

Now, we want to consider algebras that contain both addition and multi- 
plication. 

Definition 2. 2. 4. 7. A ring is an algebra ( R , -f , •) such that 

(i) (i?, +) is a commutative group, 

(ii) (. R , •) is a semigroup, and 

(Hi) addition and multiplication are related by the distributive laws: 

x-{y + z) = {x-y) + {x-z) 

(x + y) ■ z = (x ■ z) + (y ■ z) 

A ring ( R , +, •) with neutral element 0 according to + is called zero division 
free if for Vx, y E R — {0}, x • y ^ 0. 

Example 2. 2. 4. 8. Each of the sets 7Z , Q, and IR with standard addition and 
multiplication is a zero division free ring. □ 

Because of (i) of Definition 2. 2. 4. 7 we see that subtraction is well defined 
in every ring. But in general, division is not available for rings. To obtain the 
possibility to divide in rings one has to add some additional requirements. 

Definition 2. 2. 4. 9. A field is an algebra ( R , +, •) satisfying the following 
conditions: 

(i) ( R , +, •) is a zero division free ring, with the neutral element 0 according 
to -f-, 

(ii) a • b = b • a for all a,b E R, 

(Hi) there exists an element 1 G R such that 1 - a = a* 1 = a for all a E R— {0}, 
(iv) for every b E R— {0} there exists i(b) E R such that b- i{b) = 1. 

Q and IR, with respect to addition and multiplication, build fields, but for 
ZZ it is impossible to define division. In what follows we observe that Z n can 
be used to build a field for some ns, but not for every n. The reason for this 
behavior will be explained later. 

Example 2.2.4.10. Consider ZZ n for every positive integer n, with the oper- 
ations ® n and Q n . As we already observed, (ZZ n , ® n ) is a commutative group. 
One can easily verify that the distributive laws for ® n and Q n hold, too. Ob- 
viously, (2Z n ,® n ) is a semigroup. Thus, the only properties in question are 
the zero divisor freeness and the existence of inverse elements according to 

©n* 




2.2 Fundamentals of Mathematics 



59 



If n — 12, then 3 O12 4 = 3-4 mod 12 = 0. Since 3^0 and 4^0 the 
ring (^12, ®i2, ©12) is not zero division free. Obviously, this is true for every 
n that can be written as a • 5, a, b G {2, 3, . . . , n — 1}. 

Now, let us look for the inverse elements of the elements of ZZ \2 according 
to ©12. Since l©i2l = 1-1 mod 12 = 1, 1 can always be considered to be the 
inverse of 1, i.e., I -1 = 1/1 = 1 in Z 12 . Next, consider the element 2. Since 
2 ©12 a is even for every a G ZZ n , 2 • a mod 12 is even too. Thus, for every 
a G ^12, 2 0a / 1. The conclusion is that there is no inverse element of 2 
according to ©12. 

Now we consider ZZ^. Realizing all 16 multiplications of nonzero elements, 
one can see that {ZZ 5, ©5, ©5) is zero divisor free. In what follows we see the 
every nonzero element has its inverse according to ©5. 

1 © 5 1 = 1*1 mod 5 = 1, i.e., I -1 = 1 

2 © 5 3 = 2 • 3 mod 5 = 1, i.e., 2 -1 = 3 and 3 _1 = 2 because 2 ©5 3 = 3 ©5 2 
4 © 5 4 = 4 • 4 mod 5 = 16 mod 5 = 1, i.e., 4 -1 = 4. 



Thus, {ZZ§, 05, ©5) is a field. □ 

In what follows, we usually omit the full description of algebras as tuples. 
For instance, we shall speak shortly about a field Q, ZZ$, or IR assuming that 
the reader automatically considers the standard operations connected with 
them. 

Next, we shall deal with integers, and especially with primes. 

Definition 2.2.4.11. A prime is an integer p > 1, which has no factors 
( divisors 9 other than itself and one). 

An integer b > 1 that is not a prime (i.e., b = a - c for some a,c > 1) is 
called composite. 

The smallest primes are the numbers 2, 3, 5, 7, 11, 13, 17, 19, 23, — The 
importance of the class of primes is that every positive integer greater that 1 
can be expressed as a product of primes (if a number is not itself a prime, it 
may be successively factorized until all the factors are primes). 

Observation 2.2.4.12. For every integer a G IN- 2 , there exists an integer 
k > 1, primes p\, p 2 , . . . , Pk , and ii, ^2, • • • , U £ IN- 1 such that 



a = Pi -P2 Pk- 

For instance, 720 = 5 • 144 = 5 • 2 • 72 = 5 • 2 • 2 • 36 = 5 • 2 • 2 • 3 • 12 = 
5-2-2-3-3-4 = 5- 2- 2- 3- 3- 2- 2 = 2 4 -3 2 -5. 



9 Remember that an integer a / 1 is said to be a factor (or a nontrivial divisor) 
of an integer b if there is some integer c / 1 such that b = a- c (i.e., b mod a = 0). 




60 



2 Elementary Fundamentals 



One of the first questions that arises concerning the class of primes is 
whether there are infinite many different primes or whether the cardinality 
of the class of all primes is an integer. The following assertion provides the 
answer to this question. 

Theorem 2.2.4.13. There are infinitely many different primes. 

Proof. We present the old Euclid’s proof by the indirect method. Assume there 
are only finitely many different primes, say pi,P2, - • • ,p n - Any other number 
is composite, and must be divisible by at least one of the primes Pi,P 2 , • • • ,p n - 
Consider the number 

a=pi-p2 Pn + T 

Since a > pi for every i G {1,2, ...,n}, a must be composite, so, there exists 
an i G {1, 2, . . . , n} such that pi is a factor of a. But a mod pj = 1 for every 
j G {1, 2, . . . , n}, so, our initial assumption that there is only a finite number 
of primes leads to this contradiction, and hence its contrary must be true. □ 

Observe that the proof of Theorem 2.2.4.13 does not provide any method 
for constructing an arbitrary large sequence of primes. This is because a = 

Pi'P 2 Pn + 1 does not need to be a prime (for instance, 2*3*5*7*11-13+1 = 

30031 = 59 • 509 is a composite number), and if a is a prime, then it does not 
need to be the (n + l)th prime (2 • 3 + 1 = 7 but 7 is the 4th prime number, 
not the 3rd one). The only right conclusion of the proof of Theorem 2.2.4.13 
is that there must exist a prime p greater than p n and smaller than or equal 
to a. Therefore, one can search in the interval (p n , a] for the (n-b l)th smallest 
prime. 10 

As already mentioned above the primes are important because each inte- 
ger can be expressed as a product of primes. The Fundamental Theorem of 
Arithmetics claims that every integer a > 1 can be factorized into primes in 
only one way. 

Theorem 2.2.4.14 (The Fundamental Theorem of Arithmetics). For 

every integer a G IN- 2 , a can be expressed as a product of nontrivial powers 
of distinct primes 11 

« = Pi ' P2 Pk 

and up to a rearrangement of the factors , this prime factorization is unique. 

Proof. The fact that a can be expressed as a product of primes has been 
formulated in Observation 2.2.4.12. We have to prove that the decomposition 
of a into a product of primes is unique. We again give an indirect method. 
Let m be the smallest integer from IN- 2 such that 

m=p 1 -p 2 Pr = qi-q 2 q s , ( 2 . 10 ) 

10 Note that up till now no simple (efficiently computable) formula yielding an infi- 
nite sequence of primes is known. 

11 Pt # P] for i ^ j. 




2.2 Fundamentals of Mathematics 



61 



where p\ < P 2 < • • • < p r and q\ < 92 < • • ■ < q s are primes such that there 
exists i G {1, . . . , r} such that pi £ {^i , <72, - - , 9s}- 12 

First, we observe that p\ cannot be equal to q\ (If p\ =91, then m/p\ = 

P 2 Pr — 92 Qs is an integer with two different prime factorizations. 

Since m/pi < m this is the contradiction with the minimality of m). 

Without loss of generality we assume p\ < 91 . Consider the integer 



m! = m-{piq 2 qz...qs)- ( 2 . 11 ) 

By substituting for m the two expressions of (2.10) into (2.11) we may write 
the integer ml in either of the two forms: 

mf = (P 1 P 2 ■■■ Pr ) - (pi?2 •••</«)= Pl(P2P3 ■ ■ ■ Pr ~ Q2Q3 • • • Qs) (2.12) 
m' = (qiQ2-..Qs) - (PiQ2-..Qs) = (Qi -Pi)(Q 2Q3...Q s ). (2.13) 

Since pi <91, it follows from (2.13) that m' is a positive integer, while 
(2.11) implies that m! < mn. Hence, the prime decomposition of m f must be 
unique. 13 Equality (2.12) implies that p\ is a factor of m! . So, following (2.12) 
and the fact that (2.13) is a unique decomposition of m', p\ is a factor of 
either (qi — p\) or (9293 • • • 9 s)- The latter is impossible because q s > q s - 1 > 

• • • > 93 > 92 > 9i > Pi and 91 , . . . , q s are primes. Hence p\ must be a factor 
of q\ — pi, i.e., there exists an integer a such that 



9i - Pi =Pi * a- (2-14) 

But (2.14) implies 91 = pi(a + 1) and so p\ is a factor of q\. This is impossible 
because p\ < 91 and 91 is a prime. □ 

Corollary 2.2.4.15. If a prime p is a factor of an integer a • b, then p must 
be a factor of either a or b. 

Proof. Assume that p is a factor of neither a nor b. Then the product of the 
prime decomposition of a and b does not contain p. Since p is a factor of a • h, 
there exists an integer t such that a • b = p • t. The product of p and the 
prime decomposition of t yields a prime decomposition of a • b that contains 
p. Thus, we have two different prime decompositions of a • b which contradicts 
the Fundamental Theorem of Arithmetics. □ 

A large effort in number theory has been devoted to finding a simple 
mathematical formula yielding the sequence of all primes. Unfortunately, these 
attempts have not succeeded up till now and it is questionable whether such 
a formula exists. On the other hand, another important question: 

“How many primes are contained in ( 1 , 2, . . . , n} ?” 

12 So, the case r = s = 1 cannot happen because m = p\ — q± contradicts pi ^ {91}. 

13 Aside from the order of the factors. 




62 



2 Elementary Fundamentals 



has been (at least) approximately answered. The next deep result is one of the 
most fundamental assertions of number theory and it has numerous applica- 
tions (see Section 5.2 for some applications in algorithmics) . The difficulty of 
the proof of this assertion is beyond the elementary level of this chapter and 
we omit it here. 

Let Prim(n) denote the number of primes among integers 1, 2, 3, . . . , n. 

Theorem 2.2.4.16 (Prime Number Theorem). 



lim 

n— >oo 



Prim(n ) 
nj In n 



= 1 . 



□ 



In other words the Prime Number Theorem says that the density Pr ^-( n ) Q f 
the primes among the first n integers tends to ^ as n increases. The following 

table shows that is a good “approximation” of Pr - ™( n ^ already for “small” 
n (for which we are able to compute Prim(n) exactly with a computer). 



n 


Pmm(n) 

n 


1 

In n 


Pmm(n) 
n/ Inn 


hF 


0.168 


0.145 


1.159 


10 6 


0.0885 


0.0724 


1.084 


10 9 


0.0508 


0.0483 


1.053 



Note that for n > 100 one can prove that 



1 < 



Prim(n) 
n/ In n 



< 123, 



and this is already very useful in the design of randomized algorithms. 
In what follows we use the Gauss’ congruence notation 



a = b (mod d) 



for the fact 



a mod d = b mod d, 

and say that a and b are congruent modulo d. If a is not congruent to b 
modulo d, we shall write 

a ^ b (mod d). 



Exercise 2.2.4.17. Prove that, for all positive integer a > 6, the following 
statements are equivalent: 

(i) a = b (mod d ), 

(ii) a = b -J- nd for some integer n, 

(iii) d divides a — b. □ 




2.2 Fundamentals of Mathematics 



63 



Definition 2.2.4.18. For all integers a and b, a ^ 0 or b ^ 0, the greatest 
common divisor of a and 6 , gcd(a , b), and the lowest common multiple 

of a and 6, lcm(a,b), are defined as follows: 

ged (a , b) = ma x{d \ d divides both a and b, i.e ., a = b = 0(mod d)} 

and ^ 

lcm(a , 6) = — - — — = minjc I ale and b\c }. 
gcd(a,b) 

By convention, gcd( 0,0) = Zcra(0, 0) = 0. We say that a and b are co- 
primes if ged (a, b) = 1, i.e., if a and b have no common factor. 

For instance, the greatest common divisor of 24 and 30 is 6. It is clear that 

if one has the prime factorizations of two numbers a = p\ p r • qi q s 

and b = pi p r • hi h m , where {<?i, . . . , q s } fl {hi, . . . , h m } = 0, then 

gcd(a , b) = pi • p 2 Pr- For instance, 60 = 2 • 2 • 3 • 5, 24 = 2 • 2 • 2 • 3, 

and so gcd(6 0,24) = 2 • 2 • 3 = 12. Unfortunately, we do not know how to 
efficiently 14 compute the prime factorization of a given integer, and so this 
idea is not helpful if one wants to compute gcd(a,b) for two given integers 
a,b. A very efficient computation of gcd(a,b) is provided by the well-known 
Euclid’s algorithm. 15 To explain it we present some simple assertions. First, 
we give an important property of common divisors. 

Lemma 2.2.4.19. Let d > 0, a, b be some integers. 

(i) If d\a and d\b then 

d\(ax + by) 

for any integers x and y. 

(ii) For any two positive integers a, b, a\b and b\a imply a = b. 

Proof, (i) If d\a and d\b , then a = n • d and b = m • d for some integers n and 
m. So, 

ax + by = n • d - x + m • d • y = d(n • x + m • y). 

(ii) a\b implies a <b and b\a implies b < a. So, a = b. □ 

The following properties of ged are obvious. 

Observation 2.2.4.20. For all positive integers a, b , and fc, 

(i) gcd(a,b) = gcd(b,a ), 

(ii) gcd(a , 0) = a, 

(iii) ged (a, ka) — a for every integer k, 

(iv) if k\a and k\b, then k\gcd(a,b). 

14 There is no known polynomial-time algorithm for this task. 

15 Euclid’s algorithm is one of the oldest algorithms (circa 300 B.C.). 




64 



2 Elementary Fundamentals 



Exercise 2.2.4.21. Prove that the gcd operator is associative, i.e., that for 
all integers a, 6, and c, 

gcd (a, gcd(b , c)) — gcd(gcd(a , 6), c). 



□ 



Euclid’s algorithm is based on the following recursive property of gcd. 
Theorem 2.2.4.22. For any non-negative integer a and any positive integer 

b, 

gcd(a , b ) = gcd(b , a mod 6). 

Proof. We shall prove that gcd(a,b) and gcd(b, a mod 6) divide each other, 
and so by the fact (ii) of Lemma 2.2.4.19 they must be equal. 

(1) We first prove that gcd(a,b) divides gcd(b, a mod b). Following the defi- 
nition of gcd , gcd(a,b)\a and gcd(a,b)\b. We know (cf. (2.9)) that we can 
write a as 

a = (a div b) • b + a mod b 
for some non-negative integer a div b. Thus 

a mod b = a — (a div b) • b. 

This means that a mod b is a linear combination of a and b and so every 
common divisor of a and b must divide a mod b due to the fact (i) of 
Lemma 2.2.4.19. Thus, 



gcd(a , b) \ (a mod b). 

Finally, due to the fact (iv) of Observation 2.2.4.20 
gcd(a , b)\b and gcd(a , b)\(a mod b) 



imply 



gcd(a , b)\gcd(b, a mod b). 



(2) We shall prove that gcd(b , a mod b) divides gcd(a , b ). Obviously, 



gcd(b, a mod b)\b and gcd(b , a mod b)\(a mod b). 



Since 



a = (a div b) • b + a mod 6, 

we see that a is a linear combination of b and a mod b. 
Due to the assertion (i) of Lemma 2.2.4.19, 



gcd(b , a mod b)\a. 



Following the assertion (iv) of Observation 2.2.4.20, 




2.2 Fundamentals of Mathematics 



65 



gcd(b , a mod b)\b and gcd{b , a mod b)\a 



together imply 



gcd(b , a mod b)\gcd(a, b). 



□ 



Theorem 2.2.4.22 directly implies the correctness of the following recursive 
algorithm for gcd. 

Algorithm 2.2.4.23. Euclid’s Algorithm 
Euclid(a , b) 

Input: two positive integers a, b. 

Recursive Step: if b = 0 then return (a) 

else return Euclid(b , a mod b). 

Example 2.2.4.24. Consider the computation of Euclid’s algorithm for 
a = 12750 and b = 136. 



Euclid( 12750, 136) = Euclid( 136, 102) 
= Euclid( 102,34) 
= EmcM(34,0) 

= 34. 



Thus, gcd( 12750, 136) = 34. □ 

Obviously, Euclid’s algorithm cannot recurse indefinitely, since the sec- 
ond argument strictly decreases in each recursive call. Moreover, the above 
example shows that three recursive calls are enough to compute gcd of 12750 
and 136. Obviously, the computational complexity of Euclid’s ALGORITHM 
is proportional to the number of recursive calls it makes. The assertion of the 
following exercise shows that this algorithm is very efficient. 

Exercise 2.2.4.25. Let a > b > 0 be two integers. Prove that the number of 
recursive calls of Euclid{a , b) is in 0(log 2 b). 

There are many situations of both theoretical and practical interest when 
one asks whether a given integer p is a prime or not. Following the definition of 
a prime one can decide this by looking at all numbers 2, . . . , [y/p\ and testing 
whether one of them is a factor of p. If one wants to check that p is a prime 
in this way, f2(y/p) divisions have to be performed. But the best computer 
cannot perform y/p divisions if p is a number consisting of hundreds of digits, 
and we need to work with such large numbers in practice. This is one of the 
reasons why people have searched for other characterizations 16 of primes. 



16 



Equivalent definitions. 




66 



2 Elementary Fundamentals 



Theorem 2.2.4.26. For every positive integer p > 2, 

p is a prime if and only if ZZ P is a field. 



Proof In Example 2.2.4.10 we have shown that if p is a composite (p = a • b 
for a > 1, b > 1) then 7Z P with ® n , 0 n cannot build a field since a © n b = 0. 

We showed already in Example 2. 2.4.4 that (^ n ,© n ) is a commutative 
group for every positive integer n. So, one can easily see that (^ n ,® n ,0 n ) 
is a ring for all n G IN. Since 0 n is commutative and 1 0 n a = a for every 
a G FZt n 5 it is sufficient to prove that the primality of n implies the existence 
of an inverse element a~ l for every a G 2Z n — {0}. 

Now, let p be a prime. For every a G Z p — {0}, consider the following p— 1 
multiples of a: 

mo = 0 • a, mi = 1 • a, m2 — 2 - a, ... m p _ 1 = (p — 1) • a. 

First, we prove that no two of these integers can be congruent modulo p. 
Let there exist two different r, s G {0, 1, . . . ,p — 1}, such that 

m r = m s ( mod p). 

Then p is a factor of m r — m s = (r — s) • a. But this cannot occur because 
r — s<p (i.e., p is not a factor of r — s) and a < p. 17 So, mo, mi,m 2 , . . . , m p _i 
are pairwise different in 2Z P . 

Therefore, the numbers mi, m 2 , . . . , m p - 1 must be respectively congruent 
to the numbers l,2,3,...,p — 1, in some arrangement. So, 

{0 © p a, 1 Op a, . . . , a O p (p — 1)} = {0, 1, . . . ,p — 1}. 

Now we are ready because it implies that there exists b G Z p such that 
a Ob = 1, i.e., b is the inverse element of a in 2Z P . Since we have proved it for 
every a G Z p — {0}, 7Z P is a field. □ 

Exercise 2.2.4.27. Prove that if p is a prime, then ({1, 2, ... ,p — 1 }, © p ) is 
a cyclic group. □ 

Exercise 2.2.4.28. Define, for every positive integer n, 

Z* n = {ae Z n - {0} | gcd(a , n) = 1}. 



Prove that 

(i) (K , O n ) forms a group. 

(ii) the group (2£*,O n ) is cyclic if and only if either n = 2,4 or 2 p k , for 

some non-negative integer k and an odd prime p. □ 



17 



Note that this argument works because of the Fundamental Theorem of Arith- 
metics. 




2.2 Fundamentals of Mathematics 



67 



This nice characterization of primes by Theorem 2.2.4.26 leads directly to 
the following important result of the number theory. 

Theorem 2.2.4.29 (Fermat’s Theorem). For every prime p and every in- 
teger a such that gcd(a,p) — 1 

a p ~ l = 1 (mod p). 

Proof. Consider again 18 the numbers 

mi = 1 • a, m 2 = 2 • a, ... m p - 1 = (p — 1) • a. 

We claim by almost the same argument as in the proof above that no two 
of these integers can be congruent modulo p. Let there exist two different 
integers r, s e {1, 2, . . . ,p — 1}, r > s, such that 

m r = m s (mod p). 

Then p is a factor of m r — m s — (r — s) • a. This cannot occur because r — s<p 
and p is not a factor of a according to our assumption gcd(a,p) = 1. Thus, 

|{mi mod p, m 2 mod p, . . . , m p _i mod p}\ = p — 1. 

Now, we claim that none of the numbers mi, m 2 , . . . , m p _ 1 is congruent 
to 0 mod p. Since is a field, m r = r • a = 0 (mod p) forces either a = 
0 (mod p) or r = 0 (mod p). But for every re {1, 2, . . . ,p - 1}, m r = r • a, 
r < p, and gcd(a,p) = 1 (i.e., p is a factor of neither r nor a). 

The conclusion is 

{mi mod p, m 2 mod p, . . . , m p _i mod p} = {1, 2, . . . ,p — 1}. (2.15) 

Finally, consider the following number 

mi - m 2 m p _i = Ta-2-a (p— 1) - a = 1-2 (p — 1) -a p_1 . (2.16) 

Following (2.15) we get 

1-2 (p - 1) • a p ~ l = 1-2 (p - 1) (mod p), 

i.e., 

1-2 (p - 1) • {a p 1 - l) = 0 (mod p). 

Since 1-2 (p - 1) ^ 0 (mod p) and 7Z V is a field (i.e., (2Z P , 0 P , O p ) is 

zero division free), we obtain 

a p ~ l — 1 = 0 (mod p). 

□ 



18 As in the proof of Theorem 2.2.4.26 




68 



2 Elementary Fundamentals 



Exercise 2.2.4.30. Check Fermat’s Theorem for p = 5 and a — 9. □ 

A nice consequence of Fermat’s Theorem is a method for computing the 
inverse element according to multiplication. 

Corollary 2.2.4.31. Let p be a prime. Then, for every a E 2Z P — {0}, 

a~ l — a p ~ 2 mod p. 

Proof, a • a p ~ 2 = a p ~ l = 1 (mod p) due to Fermat’s Theorem. □ 

In what follows we frequently use the notation —1 (the inverse of 1 ac- 
cording to addition) for p — 1 in Z p . The following theorem provides a nice 
equivalent definition of primality. 

Theorem 2.2.4.32. Let p > 2 be an odd integer. 



p is a prime a^ p 1 ^ 2 mod p E {1, —1} for all a E 2Z p — {0}. 

Proof, (i) Let p = 2p' + 1, p' > 1, be a prime. Following Fermat’s Theorem 
a p ~ x = 1 (mod p) for every a E 7Z P — {0}. Since 

a p ~ l = a 2p — (a p — 1^ • (a p + 1^+1, 

we can write 

(a p - 1^ • (a p + 1^ = 0 (mod p). (2.17) 

Since 2Z P is a field, (2.17) implies 

(a p — 1^ = 0 (mod p) or (a p + ij = 0 (mod p). (2.18) 

Inserting p' = (p — l)/2 into (2.18) we finally obtain 

a (p-i)/2 = i ( moc i p^j or a^ p ~ 1 ^ 2 = —1 (mod p). 

(ii) Let, for all a E 2Z P — {0}, a^ p ~ x ^ 2 = ±1 (mod p). It is sufficient to show 
that 2Z P is a field. Obviously, a p ~ l = a^ p ~ 1 ^ 2 • a^ p ~ 1 ^ 2 . 

If a( p_1 )/ 2 = 1 (mod p), then a p ~ l = 1 (mod p). 

If a^ p ~ 1 ^ 2 = — 1 = (p — 1) (mod p ), then 

a p ~ l = (p — l) 2 = p 2 - 2p + 1 = 1 (mod p). 

So, for every a E 7Z p — {0}, a p ~ 2 mod p is the inverse element a~ l of a. 
To prove that 2Z P is a field it remains to show that if a • b = 0 (mod p) 
for some a, b E 7Z p then a = 0 (mod p) or b = 0 (mod p). Let a • b = 
0 (mod p), and let b ^ 0 (mod p). Then there exists b~ l E ZZ P such that 
b • b~ l = 1 (mod p). Finally, 

a = a • (b • b~ l ) = (a • b) • 6 _1 = 0 • = 0 (mod p). 

In this way we proved that 7Z P is zero division free, and so 2Z P is a field. 

□ 




2.2 Fundamentals of Mathematics 69 

In order to use an extension of Theorem 2 . 2 . 4.32 for primality testing in 
Chapter 5 , we shall investigate the properties of 7 Z n when n is composite. 
Let, for instance, n = p • q for two primes p and q. We know that 7 Z P and 
7 Z q are fields. Now, consider the direct product 2 Z P x 7 Z q . The elements of 
ZL P x 2 Z q are pairs (ai, <22), where a\ G 7 Z P and a 2 G Z 5 q . We define addition 
in 2 Z P x 7 Z q as 

(ai,a 2 ) ®p, q (bi,b 2 ) = ((ai +&i) mod p,(a 2 + b 2 ) mod q), 
and multiplication as 

(ai,a 2 ) ©p,, (61,62) = ((ai • 61) mod p, ( a 2 ■ b 2 ) mod q). 

The idea is to show that 7 Z n and 2 Z V x 2 Z q are isomorphic, i.e., there exists 
a bijection ft : 2 Z n —> 2 Z P x 7 Z q such that 

h(a 0 n b) = h(a ) © P)9 ft(6) and h(a O n n) = h(a) h(b ) 

for all a, 6 G 2 Z n . If one finds such a ft , then one can view 2 Z n as 2 Z P x 
We define ft simply as follows: 

For all a G 2 Z n , ft(a) = (a mod p, a mod <?). 

One can simply verify that ft is injective. 19 For each a, b G a and 6 
can be written as 

a = a[ • p + ax = a f 2 - q + (I2 for some a\ < p and a 2 < q , 
b = 6^ • p + hi = 62 • q + b2 for some b\ < p and b 2 < q , 

i.e., ft(a) = (ai,a 2 ) and ft (6) = (61,62)- So, 

ft(a © n 6) = h(a + b mod n) = ((a + 6) mod p, (a + 6) mod <7) 

= ((ai + 61) mod p, (a 2 + 6 2 ) mod q) 

— (cl 1, 0^2) ©p,q (61, 6 2 ) = ft(a) ©p 5 g ft(6). 



Similarly, 

h(a © n 6) = ft(a • b mod n) = ((a • 6) mod p, (a • 6) mod g) 

= ((ai • b[ • p 2 + (ai • 6i + • 61) • p + a\ • 61) mod p, 

(a 2 • 6 2 • <? 2 + (a 2 • 6 2 + a 2 • 6 2 ) • q + a 2 • 6 2 ) mod q) 

= ((ai • 61) mod p, (a 2 • 6 2 ) mod <7) 

= (ai,a 2 ) © M (61, 6 2 ) = ft(a) ft(6). 

In general, n = p\ -p 2 Pk for primes pi,p 2 , . . . ,pfc, and one considers 

the isomorphism between and x ZZ P2 x • • • x . This isomorphism 
is called Chinese remainder and we formulate it in the next theorem. 



19 In fact, we do not need to require that p and q are primes; it is sufficient to assume 
that p and q are coprimes. 




70 



2 Elementary Fundamentals 



Theorem 2.2.4.33 (Chinese Remainder Theorem, first version). Let 

m — rai • ^2 k G IN + , where mi G IN- 2 are pairwise coprimes (i.e., 

gcd(mi,rrij) = 1 /or i ^ j). Then, for any sequence of integers r\ G ZZ^, 
r 2 G ^ m2 , • • • , rk G ; ^ere zs an integer r such that 

r = ri (mod mi) 

for every i G {1, . . . , k}, and this integer r is unique in ZZm. 

Proof We first show that there exists at least one such r. Since gcd(m,i , rrij) = 
1 for i / j, gcd(f^,mi) = 1 for every l G {1, 2, , &}. It follows that there 
exists a multiplicative inverse 77* for m/mi in the group ZZ^. Consider, for 

i = 1, •••,*, 



777, 

= n* — = 777,! * 777,2 m*-i • 77 * • m i+ i • 777* +2 ra*. 

m t 

For every / G {1, . . . , k} - {i}, —■ = 0 (mod mj), and so 

6i mod 777 j =0. 

Since 77 * is the multiplicative inverse for m/mi in ZZ mi , 
e* mod 777 * = n % — mod m* — 1 . 

mi 

Now, we set 

r = r» • (mod m) 

and we see that r has the required properties. 

To see that r is uniquely determined modulo m , let us assume that there 
exist two integers x and y satisfying 

y = x = r* (mod 777*) 

for every i G {1, . . . , k}. Then, 



x — y = 0 (mod 777 *) 



for every i G {1, . . . , fc}, since m = mi • 7772 mk and gcd(mi , mj) = 1 for 

i ^ j, x = y (mod 777 ). □ 

The first version of the Chinese Remainder Theorem can be viewed as a 
statement about solutions of certain equations. The second version can be 
regarded as a theorem about the structure of ZZ n . Because the proof idea 
has been already explained for the isomorphism between Z p . q and ZZ P x ttq 
above, we leave the proof of the second version of the Chinese Remainder 
Theorem as an exercise to the reader. 




2.2 Fundamentals of Mathematics 



71 



Theorem 2.2.4.34 (Chinese Remainder Theorem, second version). 

Let m = mi • m 2 mk, k G IN + , where ra* G IN- 2 are pairwise coprimes 

(i.e., gcd(mi,mj) — 1 for i ^ j). Then ZZm is isomorphic to ^ mi x ^ m2 x 

X * 

Exercise 2.2.4.35. Prove the second version of the Chinese Remainder The- 
orem. □ 

The above stated theorems are fundamental results from number theory 
and we will use them in Chapter 5 to design efficient randomized algorithms 
for primality testing. For the development of (randomized) algorithms for al- 
gorithmic problems arising from number theory, one often needs fundamental 
results on (finite) groups, especially on Hence we present some of the 
basic assertions in what follows. 

Since ZZ* n = {a G 7Z n \ gcd(a,n) = 1}, we first look for some basic 
properties of greatest common divisors. We start with an equivalent definition 
of the greatest common divisor of two numbers. 

Theorem 2.2.4.36. Let a,b e IN — {0}, and let 

Com(a , b) = {ax + by \ x,y G ZZ} 

be the set of linear combinations of a and b. Then 

gcd(a , b) = min{d G Com(a , b) \ d> 1}, 

i.e., gcd(a,b) is the smallest positive integer from Com(a,b). 

Proof. Let h = min{d G Com(a,b ) | d > 1}, and let h — ax P by for some 
x, y G 7Z. We prove h — gcd(a , b) by proving the inequalities h < gcd(a , b ) 
and h > gcd(a , b) separately. 

First we show that h divides both a and b and so it divides gcd(a,b). 
Following the definition of modulo n, we have 

a mod h — a- [ a/h\ • h — a- [a/h\ •( ax + by ) = a-( 1— \a/h\x) + b-(—\a/h\y) 

and so a mod h is a linear combination of a and b. Since h is the smallest 
positive linear combination of a and 6, and a mod h < h, we obtain 

a mod h — 0, i.e., h divides a. 

The same argumentation provides the fact that h divides b. Thus 

h < gcd(a , b). 

Since gcd(a , b ) divides both a and 6, gcd(a , b) must divide au + bv for 
all u,v G i.e., gcd(a,b) divides every element in Com(a,b). Since h G 
Com(a , 6), 

gcd(a,b) divides h , i.e., gcd(a,b) < h. 



□ 




72 



2 Elementary Fundamentals 



Theorem 2.2.4.37. Let a,n e IN — {0}, n > 2, and let gcd(a,n ) = 1. Then 
the congruence 

ax = 1 (mod n) 

has a solution x £ ZZ n . 

Proof. Following Theorem 2.2.4.36, there exists u, v £ 2Z such that 
a • u + n • v = 1 = gcd(a , n). 

Observe, that for every k £ Z 

a-u + n- v = a-(u + kn) + n • (v — ka ), 



and so 

a • (u + kn) + n - (v — ka) = 1. 

For sure there is a l £ ^ such that a + In £ 2£ n . Set x = w + In. Since 

a(ix + In) + n(r — la) = a(^ + Zn) = (1 mod n), 

x is a solution of the congruence ax = 1 (mod n). □ 

Exercise 2.2.4.38. Let a, n satisfy the assumptions of Theorem 2.2.4.37. 
Prove that the solution x of the congruence ax = 1 (mod n) is unique in 

Exercise 2.2.4.39. Let a, n £ IN — {0}, n > 2, and let gcd(a,n) = 1. Prove, 
that for every 6 £ ^ n , the congruence ax = b (mod n) has a unique solution 

X £ TZ n . 

Now, we are ready to prove one of the most useful facts for developing 
randomized algorithms for primality testing. Later we extend the following 
assertion for any positive integer n. 

Theorem 2.2.4.40. For every prime n , (^*,0 mo dn) is a commutative 
group. 

Proof. Let a, b be arbitrary elements of Following the definition of 
we have gcd(a,n) = gcd(b,n) = 1. Since 

Com(ab , n) C Com(a, n) fl Com(6, n), 

Theorem 2.2.4.36 implies that the number 1 £ Com(ab , n), and so gcd(ab , n) = 
1. Thus, ab £ ^*, i.e., is closed under multiplication modulo n. This 
implies that (i5?* , 0 mo d n) is an algebra. 

Obviously, 1 is the identity element with respect to © mo d n and © mo d n 
is an associative and commutative operation. 

Theorem 2.2.4.37 ensures the existence of an inverse element x — a~ x for 
every a £ (i.e., for every a £ 2Z n with gcd{a , n) — 1). Following Definition 

2.2.4. 1, the proof is completed. □ 




2.2 Fundamentals of Mathematics 



73 



Exercise 2.2.4.41. Let ( A , *) be a group. Prove the following facts. 

(i) For every a e A, a = (a -1 ) -1 . 

(ii) For all a, 6, c G A, 

a * b = c* b implies a = c, and 
b * a = b * c implies a = c. 

(iii) For all a, 6, c E A, 






□ 



Definition 2.2.4.42. Let (A, o) 6e a group with the identity element 1. For 
every a e A, the order of a, order(a) is the smallest r G IN - {0} ; such 
that 

a r = 1, 

if such an r exists. If a 1 ± 1 for alii el N — {0} ; then we set order(a) = oo. 

We show that Definition 2.2.4.42 is consistent, i.e., that every element of 
a finite group ( A , o) has a finite order. Consider the \A\ + 1 elements 

a 0 , a 1 , a 2 , . . . , a) A ^ 

from A. There must exist 0 < i < j < \A\ such that 



This implies 

1 = a < -( a - 1 ) < = aMa- 1 ) i = ak'- <) 

and so order(a) < j — i, i.e., order(a) € {1,2 ,..., |.A|}. 

Definition 2.2.4.43. Let ( A , o) be a group. An algebra (if, o) is called a sub- 
group of A if (if, o) is a group and H C A. 

For instance ( 7Z , +) is a subgroup of (Q, +), and ({1}, 0 mod 5 ) is a sub- 
group of {2Z 5 , o mod 5 )- But {2Z%, O mod 5 ) is not a subgroup of (ZZ?, O mod 7 ) 
because 0 mo d 5 and 0 mo d 7 are different operations (4 © mo d 5 4 = 1 and 

4 0 mod 7 4 = 2). 

Lemma 2.2.4.44. Let (H, o) be a subgroup of a group (A, o). Then the iden- 
tity elements of both groups are the same. 

Proof. Let e# be the identity of (iL, o), and let ca be the identity of (A, o). 
Since e# is the identity of (ff, o), 

— (2.19) 

Since is the identity of (A, o) and ch E A, 




74 



2 Elementary Fundamentals 



e A oe H = e H . (2.20) 

Thus, the left sides of the equalities (2.19) and (2.20) are the same, i.e., 

eH °e H = e A o e H - (2.21) 

If efj 1 is the inverse of en in (A, o), then multiplying (2.21) by efj 1 we obtain 
£h = £h o en ° e# 1 = e A o en ° efj 1 — e A . 



□ 

Theorem 2.2.4.45. (A, o) a finite group. Every algebra ( if , o) 

H C A is a subgroup of (A, o). 

Proof. Let H C A, and let (if, o) be an algebra. To prove that (if, o) is a 
subgroup of ( A , o), it is sufficient to show that ( if , o) is a group, i.e., that e A 
is the identity of (if, o) and that every b G if has its inverse 6 -1 in if. 

Let b be an arbitrary element of if. Since b G A and A is finite, order(b ) G 
IN - {0}. Thus 

b °rder(b) = ^ 

Since b % G H for all positive integers i (remember that H is closed under o), 
e A G H. Since 

e A o d = d 

for every d G A and H C A, e A is the identity of (H, o), too. 

Since f ) order ( b )- 1 ^ fj f or an y b £ H and 

e A = b order ( b ) —b o ^order(6)-l^ 

b°rder (b)-i - g i nverse element of b in ( H , o). □ 

Theorem 2.2.4.45 is a useful instrument when working with groups, because 
in order to prove that (if, o) is a subgroup of a finite group ( A , o), it is sufficient 
to show that H C A and that H is closed under o. 

Note that the assumption of Theorem 2.2.4.45 that A is finite is essential, 
because (IN, -f) is an algebra, but (IN, +) is not a subgroup of the group 
(%,+). 

Exercise 2.2.4.46. Let (if, o) and (G, o) be two subgroups of a group ( A , o). 
Prove that (if fl G, o) is a subgroup of (A, o). □ 

Lemma 2.2.4.47. Let ( A , o) be a group with the identity e and let a € A be 
an element with a finite order. Then , for if (a) = {e, a, a 2 , . . . , a order ( a )~ 1 ^ f 
(if (a), o) is ^fte smallest subgroup of (A, o) that contains a. 

Proof. First, we prove that if(a) is closed under o. Let a 1 and a j be two 
arbitrary elements of if (a). If i + j < order (a), then 

a 1 o o? — G if (a). 



If i + j > order (a), then 




2.2 Fundamentals of Mathematics 



75 



a* o a j = a i+j = a order{a) o a i+ 3-° rder ( a ) 

= eo a i+3-order(a) = a i+j-order(a) & #( a ). 

Following the definition of H(a), e G H{a). For every element a 1 G H(a), 

e = a order ( a ) — a? o a order{a)-i 

i.e., a order ( a )- 1 i s the inverse element to a 1 . 

Since every algebra (G, o) with a G G must contain if (a), (if (a), o) is the 
smallest subgroup of (A, o) that contains a . □ 

Definition 2.2.4.48. (ff, °) 6e a subgroup of a group (A,o). For every 
b G A, the set 

H ob = {hob\heH} 
is called a right coset of H in (A, o), and the set 

boH = {boh\heH} 

is called a left coset of H in ( A , o). If H o b — b o H, then H ob is called a 
coset of H in (A,o). 

For instance, ({7 • a \ a e 2^}, +) is a subgroup of ( 2Z , +). Let B? = {7 • a \ 
a G 2Z} . Then 

Bj T i = i T Bj — -[7 • n T i | a G G | b mod 7 — i} 

are cosets of Bj in {2Z,+) for i = 0,1,..., 6. Observe that {B 7 + i | i = 
0, 1, . . . , 6} is a partition of 2Z into 7 disjoint classes. 

Observation 2.2.4.49. If (H, o) is a subgroup of a commutative group (A, o), 
then all right cosets (left cosets) of H in (A 1 o) are cosets. 

An important fact about cosets H o b is, that their size is always equal to 
the size of H. 

Theorem 2.2.4.50. Let (if, o) be a subgroup of (A, o). Then the following 
facts hold. 

(i) H oh — H for all h G H . 

(ii) For all b,c G A, 



either H ob — H o c or HobF^Hoc — 0. 

( Hi ) If H is finite, then 

\Hob\ = \H\ 

for all b G A. 

Proof We prove these three claims separately. Let e be the identity of ( A , o) 
and (if, o). 




76 



2 Elementary Fundamentals 



(i) 



Let h G if. Since if is closed under o, we obtain aoh G if for every a G if, 
i.e., 

if o h C if. 



Since (if, o) is a group, h _1 G if. Let b be an arbitrary element of if. 
Then 

b = boe = bo (h~ l oh) = (bo h~ l ) o h G if o h, i.e., 

^ V* y ^ V* ^ 

e eH 

HCHoh. 



Thus Hoh = H. 

(ii) Let HobnHoc^ftior some 6, c G A. Then there exists a \ , <22 G if, such 
that 

a\ o b = ci2 o c. 

This implies c = 1 0 &i 0 6, where a^ 1 G if . Then 

H o c — H o ( a 2 1 o ai o b) = H o (a" 1 o a\) o b. (2.22) 

Since a^ 1 , ai G if, the element a^" 1 o ai belongs to if, too. 

This implies, because of (i), that 

if o ( a 2 1 o a\) = if. (2.23) 



Thus, combining (2.22) and (2.23) we obtain 

H o c = H o ( a f 1 o ai) o b — H o b. 

(iii) Let if be finite, and let b G A Since if o 5 = {hob | h G if}, we 
immediately have 

| if o6| < |if|. 

Let if = {hi, /i 2 , . . . , hk} for some k G IN. We have to show that 

| {hi o 6, /i 2 o 5, . . . , hk o 6} | > h, i.e., that hi o b hj o b 

for all i,j G {1, . . . , h} with i ^ j. Since (A, o) is a group, b~ l G A and so 
hi o b = hj o b would imply hiobo b~ x — hj o b o b~ l . Thus, 

hi = hi o (bo b~ l ) — hj o (b o b~ x ) = hj 



would contradict the assumption hi ^ hj 



□ 



As a consequence of Theorem 2.2.4.50 we obtain that one can partition the 
set A of every group (A, o), that has a proper subgroup (if, o), into pairwise 
disjoint subsets of A, which are the left (right) cosets of if in (A, o). 

Theorem 2.2.4.51. Let (if, o) be a subgroup of a group (A, o). Then {if ob \ 
b G A} is a partition of A. 




2.2 Fundamentals of Mathematics 



77 



Proof. The claim (ii) of Theorem 2.2.4.50 shows that ifo5nifoc = 0or 
Hob = Hoc. So, it remains to show that A C \J beA Hob. But this is 
obvious because the identity e of (A, o) is also the identity of (if, o), and so 
b = eob E H o b for every b G A. □ 

Definition 2.2.4.52. Let (if, o) be a subgroup of a group (A, o). We define 

the index of H in (A, o) as 

Index H (A ) = \{H o b \ b e A}|, 

i.e.y as the number of different right cosets of H in (A, o). 

The following theorem of Lagrange is the main reason for our study of 
group theory. It provides a powerful instrument for proving that there are 
not too many ’’bad” elements with some special properties in a group (A, o), 
because it is sufficient to show that all ’’bad” elements are in a proper subgroup 
of (A, o). The following assertion claims that the size of any proper subgroup 
of (A, o) is at most |A|/2. 

Theorem 2.2.4.53 (Lagrange’s Theorem). For any subgroup (if, o) of a 
finite group (A, o), 

\A\=Index H (A)-\H\, 

i.e., \H\ divides \ A\. 

Proof. Following Theorem 2.2.4.51, A can be divided in Index h (A) right 
cosets, which are pairwise disjoint and all of the same size \H\. □ 

Corollary 2.2.4.54. Let (if, o) be a subalgebra of a finite group (A, o). If 
if C A, then 

\H\ < \A\/2. 

Proof. Theorem 2.2.4.45 ensures that (if, o) is a subgroup of (A, o). Following 
Lagrange’s Theorem 

|A| = Indexn(A) • |if|. 

Since H C A, 1 < |if| < |A| and so 

Index h (A) > 2. 



□ 

Corollary 2.2.4.55. Let (A, o) be a finite group. Then, for every element 
a £ A, the order of a divides \ A\. 

Proof. Let a be an arbitrary element of A. Following Lemma 2.2.4.47 (if (a), o) 
with if (a) = {e, a, a 2 , ... , a order ( a )~ 1 y — { a , a 2 , . . . , a order ( a )} is a subgroup 
of (A, o). 

Since |if(a)| = order (a), Lagrange’s Theorem implies 
|A| = Index H ( a }(A) • order(a). 



□ 




78 



2 Elementary Fundamentals 



In Chapter 5 we will often work with the sets 

= {a G ZZ n | gcd(a,n ) = 1} 

for n G IN — {0}. In Theorem 2.2.4.40 we proved that 0 P ) is a commuta- 
tive group for every prime p. Note that ZZ * = ZZ V — {0} = {1,2,... ,p— 1} for 
every prime p. We prove now, that {2Z„ , 0 n ) is a group for every n G IN - {0}. 

Theorem 2.2.4.56. For every n G IN - {0}, (^*,© n ) is a commutative 
group. 

Proof. First of all we have to show that ZZ* n is closed under O n . Let a,b be 
arbitrary elements of ZZ* n = {a G | gcd(a,n) = 1}. We have to show that 
a 0 n b G 2£*, i.e., that gcd(a © n 6, n) = 1. Let us assume the contrary, i.e., 
that gcd(a 0 n 6, n) = k for some k >2. Then n = k • v for some v G IN — {0} 
and a • b mod n — k • d for some d e IN — {0}. This implies 

a • 6 mod = kd , i.e., a • b = kv • 5 + kd 

for some s G IN. Since fcus + kd = k(vs + d), k divides a • b. Let k — p • m, 
where p is a prime. Obviously, either p divides a or p divides b. But this is the 
contradiction to the facts n = p • m • v, gcd(a , n) = 1, and gcd(b , n) = 1. Thus, 
a • 6 mod p G . 

Clearly, 1 is the identity element of (fZ * n , © n ). Let a be an arbitrary element 
of ZZ* n . To prove that there exists an inverse a~ l with a © n a -1 = 1, it is 
sufficient to show that 

\{a © n 1 j ©n 2, . . . ,d © n ( n 1) } | — ri 1, 

which implies 1 G {a © n 1, . . . , a 0 n (n — 1)}. Let us assume the contrary. Let 
there exist i, j, i > j, such that 

a © n i = a © n j, i.e., a • i = a • j (mod n). 



This implies 

a-i — nki + z and a • j = n • &2 + z 
for some fci, fe, z G IN with z < n. Then 

a - i — a • j = nki — n &2 = n(fci — A^), 



and so 

a * (& — j) — n(ki — fe), i.e., n divides a • (i — j). 

Since gcd(a, n) — 1 , n must divide (z— j). But this is impossible, because i—j < 
n. Thus, we can infer that (2Z*,O n ) is a group. Since 0 n is a commutative 
operation, (2?* , © n ) is a commutative group. □ 




2.2 Fundamentals of Mathematics 



79 



Next we show that contains all elements of 7Zi n that have an inverse 
element with respect to © m od n, i.e., that 

^ n ~ { a ^ | < 7 cd(a, n) = 1 } ^2 244 

= {a G | 3a -1 G such that a 0 mo d n a -1 = 1}. ' ’ 

Theorem 2.2.4.56 implies that 2Z* n C {a G 7Z n | 3a -1 G because 
(&n , © m od n) is a group. Thus, the following lemma completes the proof of 
(2.24). 

Lemma 2.2.4.57. Let a G 2Z n . If there exists an a~ l G Z n such that a © n 
a~ l = 1, then 

gcd(a , n ) = 1 . 

Proof. Theorem 2.2.4.36 claims that 

gcd(a, n) — min{d G IN — {0} | d — ax + by for x, y G 2Z}. 

Let there exist an element a -1 with a© mo d n & -1 = 1 . So a- a -1 = 1 ( mod n), 
i.e., 

a • a -1 = k • n + 1 for a k G IN. 

Choosing x = a -1 and y = —k we obtain 

a • a -1 + n • (— fc) = fc • ra + 1 — fc • n = 1 G Com(a, n) 

and so gcd(a , n) = 1 . □ 

We conclude this section by proving a fundamental result of the group 
theory. Let <p(n) = |2£* | for any n G IN be the so-called Euler’s number. 

Theorem 2.2.4.58 (Euler’s Theorem). For all positive integers n > 1 

a^( n ) = 1 (mod n) 



for all a G . 

Proof Let a be an arbitrary element from 2Z ^ . Corollary 2.2.4.55 implies that 
order(a) in (2£*,0 n ) divides |^*| = (p(n). Since a order ^ = 1 (mod n) we 
obtain 

a^( n ) mod n = ( a °rder(a)^(n)/order(a) mod n 

= \a order ^ mod n )V>(n)/order(a) mod n 
= l^(^)/order(a) mod n = 



□ 

Exercise 2.2.4.59. Apply Euler’s Theorem in order to give an alternative 
(algebraic) proof of Fermat’s Theorem. 




80 



2 Elementary Fundamentals 



Keywords introduced in Section 2.2.4 

group, semigroup, ring, field, prime, greatest common divisor, Euclid’s algorithm, 
generator of a group, cyclic group, order of a group element, coset, index of a group 

Summary of Section 2.2.4 

The main assertions presented in Section 2.2.4 are: 

• There are infinitely many primes and the number of primes from {2, 3, 4, , 
n} is approximately ^ (Prime Number Theorem). 

• Every integer greater than 1 can be expressed as a product of nontrivial powers 
of distinct primes and this prime factorization is unique (Fundamental Theorem 
of Arithmetics). 

• p is prime if and only if a ( p-1 )/ 2 mod p G {1, —1} for every a G {1,2, ... ,p — 
!}• 

• If n = pi -p 2 Pk for primes pi,... ,Pk, then 2Z n is isomorphic to x 

7Z P 2 x • • • x 2Z Vk (Chinese Remainder Theorem). 

• If (H, o) is a subgroup of a group (A, o), then \H\ divides \A\. 

• The order of every element a of a group (A, o) divides \A\. 

• For every 2Z* n , n G IN, and every a G a mod n = 1 (Euler’s Theorem). 

2.2.5 Probability Theory 

The probability theory has been developed to study experiments with un- 
certain outcomes. The fundamental notions of probability theory are “sample 
space” and “elementary event”. A sample space S is the set of all basic events 
that may happen in some experiment. Every element of S is called an elemen- 
tary event. Intuitively, in some experiment, the sample space is the set of all 
results (events) that may be the outcomes of the experiment. For instance, in 
flipping a coin one can consider the following two outcomes: “head” and “tail”. 
Thus, { head , tail} is the sample space and head and tail are the elementary 
events. If one flips three coins (one after each other or at once), then the sam- 
ple space is {(x, y , z) \ x, y, z G {head, tail}}. The sequences (head, head, tail), 
(tail, head, tail), and (head , head , head) are examples of elementary events. 
Intuitively, an elementary event is an event that cannot be expressed as a col- 
lection of smaller, more fundamental events, i.e., an elementary event cannot 
be partitioned into pieces from the point of view of the experiment considered. 

Another example of an experiment is a fixed (randomized) algorithm. Dur- 
ing the computation on an input x such a randomized algorithm A may have 
a choice from several possibilities on how to continue. Thus, it may execute 
different computations depending on the choices. From this point of view, 
the sample space Sa,x is the set of all possible sequences of random choices, 
or equivalently the set of all computations of A on x (one computation for 
each sequence of random choices). Thus, every computation of A on x can be 
considered as an elementary event. 




2.2 Fundamentals of Mathematics 



81 



For our purposes it is sufficient to consider that 5 is countable. So, all 
following definitions of fundamental notions assume the countability of 5. An 
event is any subset of the sample space 5. The event 5 is called the certain 
event, and the event 0 is called the null event. Two events S\,S2 C 5 
are called mutually exclusive if Si D S 2 = 0- For instance, for S = 
{{x,y, z) \x,y, z G {head, tail}}, S 1 = {{head, head, head), {tail , head , tail)} 
and S2 = {{tail, head, head), {head, tail, head), {head, head, tail)} are two 
events that are mutually exclusive. Considering Sa, x, one may be interested 
in computations of i on r, in which the correct output is computed. Then, 
one considers the set Cor{A, x) of all computations providing the right out- 
put as an event. Cor{A, x) is mutually exclusive to the event consisting of all 
computations that finish with a wrong output. 

In what follows we have the following scenario. For a finite sample space 
5, we want to assign a probability to every elementary event. This assignment 
should correspond to the reality, i.e., to the experiment executed. To be fair, 
we have (among others) the following requirements on this assignment: 

(i) Every elementary event has some non- negative probability. 

(ii) The sum of the probabilities of all elementary events gives certainty (de- 
noted by 1 in probability theory). 

(iii) There is a fair way to compute the probability of any event S\ C S (among 
others, the probability of Si plus the probability of the complement 5 — 51 
must be certainty). 

This results in the following definition. 

Definition 2.2.5. 1. A probability distribution Prob on a sample space 
S is a mapping from events of S to real numbers (Prob : 2 s — > JR-° ) such 
that the following probability axioms are satisfied: 

(1) Prob{{x}) > 0 for every elementary event x, 

(2) Prob{S) = l, 

(3) Prob{X UT) = Prob{X) + Prob{Y) for any two mutually exclusive events 
X andY (XnY = V>). 

(If one considers an infinite S, then one requires Prob ((J2i = 

Prob{Xi) for every countable infinite sequence of mutually exclusive 
events X\, X2, X3 , . . ..) 

Prob{X) is called the probability of the event X. 

If one considers 5 = {head, tail} for a fair coin flipping, then Prob {{head}) — 
Prob{{tail}) = 1/2. One can simply observe that the probabilities of the el- 
ementary events unambiguously determine the probabilities of all events in 
every sample space. The proof of the following simple observation is left to 
the reader. 

Exercise 2. 2. 5. 2. Prove for all events X,Y of a sample space 5 and every 
probability distribution Prob on 5 that 




82 



2 Elementary Fundamentals 



(i) Prob{$) = 0, 

(ii) ifXcy, then Prob(X) < Prob(Y ), 

(hi) Prob(S -X) = l- Prob(X), 

(iv) Prob(XUY) = Prob(X) + Prob(Y)-Prob(XnY ) < Prob(X) + Prob(Y). 

□ 

In what follows we always consider that 5 is finite or countably infinite. In 
this case we speak about discrete probability distribution. If, for every 
elementary event x of a finite S, 

Prob({x}) = |F_, 

then Prob is called the uniform probability distribution on S. 

Example 2. 2. 5. 3. Consider the sample space 

S = {(x, y , z)\x,y,z e {head, tail}} 

for the experiment of flipping three coins. If we consider fair coins, it means 
that Prob({a }) — | f° r every elementary event a G S'. 

What is the probability of getting at least one head? This event is Head — 
{(head, head, head), (head, head, tail), (head, tail, head), (head, tail, tail), (tail, 
head, tail), (tail, tail, head), (tail, head, head)}. Thus, 

Prob(Head) — Prob({a}) = 

a£Head 

A more convenient way to evaluate Prob (Head) is to say that S — Head = 
{(tail, tail, tail)}, and so Prob(S — Head) = | directly implies 

Prob(Head) = 1 — Prob(S — Head) = 

8 

□ 

Exercise 2. 2. 5. 4. Let n > k > 0 be two integers. Consider the experiments of 
flipping n coins and the corresponding sample space S = {(xi , X 2 , . . . , x n ) \ Xi G 
{head, tail} for i = 1 ,...,n}. What is the probability of getting exactly k 
heads if Prob is the uniform probability distribution on S'? □ 

The notions defined above are suitable when looking at an experiment only 
once, i.e., at the very end. But sometimes we can obtain partial information 
about the outcome of the experiment by some intermediate observation. For 
instance, we flip three coins one after each other and we look at the result of 
the first coin flipping. Knowing this result we ask what the probability is of 
getting at least two heads in the whole experiment. Or, somebody tells you 
that the result (x, y, z) contains at least one head and knowing this fact you 
have to estimate the probability that (x, y, z) contains at least two heads. The 
tasks of this kind result in the following definition of conditional probabilities. 




2.2 Fundamentals of Mathematics 



83 



Definition 2. 2. 5. 5. Let S be a sample space with a probability distribution 
Prob. The conditional probability of an event ICS given that another 
event Y C S occurs (with certainty) is 



Prob(X\Y) = 



Prob(XnY) 

Prob(Y) 



whenever Prob(Y) ^ 0. We also say that Prob(X\Y) is the probability of 
X given Y. 



Observe that the definition of the conditional probability is natural. XnY 
consists of elementary events that are in both X and Y. Since we know that Y 
happens, it is clear that no event from X—Y can happen. Dividing Prob (XnY) 
by ProbiY) we normalize the probabilities of all elementary events in Y since 



Prob({e}) 1 

^ Prob{Y) = ProbiY) 



Y1 Pr ° b ({ e }) 

eeY 



1 

Prob(Y) 



• Prob(Y) = 1. 



Intuitively it means that we exchange S by Y, because Y happens with cer- 
tainty. Thus, the conditional probability of X given Y is the ratio of the 
probability of the event X fl Y to the probability of Y. 



Example 2. 2. 5. 6. Let us again consider the experiment of flipping three 
coins. Let X be the event that the result contains at least two heads, and 
let Y be the event that the result contains at least one head. Since XnY = X 
we obtain 



Prob(X\Y) = 

def. 



Prob(X n Y) 
ProbiY) 



Prob(X) _ | _ 4 
ProbiY) ~ | ~ 7 



□ 



Definition 2. 2. 5. 7. Let S be a sample space with a probability distribution 
Prob. Two events 1,7 C S are independent if 

Prob(X fl Y) = Prob(X) • Prob{Y). 

The following observation relates independence with conditional probabil- 
ity, and it provides an equivalent definition of the independence of two events. 

Observation 2. 2 . 5. 8 . Let S' be a sample space with a probability distribu- 
tion Prob. Let X,Y C S, and let Pro b(Y) ^ 0. Then X and Y are independent 
if and only if Prob(X\Y) = Prob(X). 

Proof, (i) If X and Y are independent, then Prob(X D Y) = Prob(X) • 
Prob(Y). So, 

Pmi(X \Y) = = = 

ProbiY) Pmb(Y ) 




84 



2 Elementary Fundamentals 



(ii) Let Prob(X\Y) = Prob(X) and Prob(Y) ^ 0. Then 

Prob(X) = Prob(X\Y ) = Pr °^ X YY\ 
def. Prob(Y) 

which directly implies Prob(X D Y) = Prob(X) • Prob(Y). □ 

Due to Observation 2. 2. 5. 8 we see that if two events X and Y are indepen- 
dent, then the knowledge that X (T) occurs with certainty does not change 
the probability Prob(Y ) ( Prob(X )). This corresponds to our intuitive mean- 
ing of the independence of two events X and Y that if one knows that the 
experiment results in an event X we cannot obtain any partial information 
about the correspondence between this result and the event Y. 

Example 2. 2. 5. 9. Consider our standard experiment of flipping three coins. 
Let 

X = {( head , head , head), (head, head, tail), (head, tail, head), (head, tail, tail)} 

be the event that the result of the first coin flipping is head. Obviously, 
Prob(X) = | = Let 

Y = {(head, tail, head), (head, tail, tail), (tail, tail, head), (tail, tail, tail)} 

be the event that the result of the second coin flipping is tail. Clearly, 
Prob(Y) = Since In Y — {(head, tail, head), (head, tail, tail)}, 

Prob(X nr) = | = i = Li = Probi X) ■ Prob(Y). 

Thus, X and Y are independent and this corresponds to our intuition because 
the result of the first coin flipping does not have any influence on the result 
of the second coin flipping. □ 

Exercise 2.2.5.10. Determine all pairs of independent events of the experi- 
ments from the above example. □ 

Exercise 2.2.5.11. Let S be a sample space with a probability distribution 
Prob. Prove that for all events A\,A 2 ,...,A n C S such that Prob(Ai) ^ 
0, Prob(Ai D A 2 ) ± 0, ... , Prob(Ai D A 2 D • • • D A n -i) / 0, 

Prob(A\ fi A 2 fl • • • n A n ) — Prob(Ai) • Prob(A 2 \Ai) 

• Prob(As\Ai D A 2 ) 

• Prob(A n \A 1 H A 2 n • • • H A n —\). □ 



Theorem 2.2.5.12 (Bayes’ Theorem). Let S be a sample space with a 
probability distribution Prob. For every two events X, Y C S with nonzero 
probability, 




2.2 Fundamentals of Mathematics 



85 



(i) Prob(X\Y) = 

(ii) Prob{X\Y) = 



Prob(X) ■ Prob(Y\X) 

Prob(Y) 

Prob(X) • Prob(Y\X) 

Prob(X) • Prob(Y\X) + Prob{S - X) • Prob(Y\S - X) ‘ 



Proof, (i) From the definition of the conditional probability we obtain 
Prob(X n Y) = Prob(Y) • Prob(X\Y) = Prob(X) • Prob{Y\X). 



The last equality implies directly 



Prob(X\Y) = 



Prob(X) • Prob(Y\X) 
Prob(Y) 



(ii) To get the equality (ii) from (i) we show that Prob(Y) can be expressed 
as 

Prob(X) • Prob(Y \X) + Prob{S - X) • Prob(Y\S - X). 

Since y = (bni)u(7n(S- X)) and (Y n x) n (Y n (S — X)) = 0, 

Prob{Y) = Prob(Y n X) + Prob(Y n {S - X)) 

= Prob(X) • Prob(Y\X) + Prob{S - X) • Prob(Y\S - X). 



Now, we define a notion that is crucial for the analysis of the behavior of 
randomized algorithms. Let S' be a sample space. 20 Any function F from S to 
H is called a (discrete) random variable on S. This means that we associate 
a real number with every elementary event of S (outcome of the experiment). 
To see a motivation for this notion one can consider the work of a randomized 
algorithm A on a fixed input x as the experiment and one run (computation) 
of A as an elementary event. If F assigns the length (time complexity) of C 
to every run C of A, then we can analyze the “expected” time complexity of 
A by the probability distribution induced by F on IR. Then one can ask what 
the probability is that A computes the output in a given time t or vice versa - 
what is the smallest time t' such that A finishes the work in time t' with the 
probability of at least 1/2. Another possibility to define F is that F assigns 1 
to a particular run C of A on x if the output produced in this run is correct, 
and F assigns 0 to C if the output is wrong. Then the “average” (expected) 
value of F gives us the information about the reliability of the algorithm A. 
Thus, the free choice of a random variable provides a powerful instrument 
for the analysis of the behavior of the considered random experiment. Note 
that an appropriate choice of random variables does not only decide which 
properties of the experiment will be investigated, but it also may influence 
the success of this analytic approach as well as the difficulty (efficiency) of the 
execution of this analysis. 

20 Remember that we assume S is either finite or countably infinite. 




86 



2 Elementary Fundamentals 



Definition 2.2.5.13. Let S be a sample space with a probability distribution 
Prob, and let F be a random variable on S. For every x E JR, we define the 
event F = x by 

Event(F = x) = {s E S \ F(s) = x}. 

The function fp : IR — > [0, 1] defined by 

f f(?c) = Prob(Event(F = x)), 

is called the probability density function of the random variable F. 

The distribution function of F is a function Disp : H — ► [0, 1] defined 
by 

Disp(x) = Prob(F < x) = ^ Prob(Event(F = y)). 

y<x 

In what follows, we use the notation F = x instead of Event (F = x), and 
so the notation Prob(F = x) instead of Prob(Event(F = x)). 

Observation 2.2.5.14. Let S , Prob , and F be as in Definition 2.2.5.13. Then, 
for every x e JR, 

(i) /f(x) = Prob(F = x) = E{ s€ s|F( s )=x} P™b({s}), 

(ii) Prob(F = x) > 0, and 

(iii) J2 v eR Prob i F = y) = h 

□ 

Example 2.2.5.15. Consider the experiment of rolling three 6-sided dices. 
The outcome of rolling one dice is one of the numbers 1, 2, 3, 4, 5, and 6 and we 
consider that the probability distribution is uniform. So, 5 = {(a, b, c)\a,b,ce 
{1, 2, 3, 4, 5, 6}} and Prob({s}) — ^ for every s E S'. Define the ran- 

dom variable F to be the sum of the values on all three dices. For instance, 
F(( 3, 1, 5)) = 3 + 1 T 5 = 9. The probability of the event F = 5 is 

Prob(F = 5)= ^ Prob({s})= 5Z Prob({(a,b,c)}) 

s£S a-\-b-\-c=5 

F(s) = 5 a,b,c£ {1,2,... ,6} 

= Prob({(l, 1, 3)}) + Prob({(l, 3, 1)}) 

+Prob({(3, 1, 1)}) + Prob({(l, 2, 2)}) 

+Prob({( 2, 1, 2)}) + Prob({( 2, 2, 1)}) 

= 6.J- = L 

216 36 

Let G be the random variable defined by G((a,b,c)) = ma x{a,b,c} for 
every elementary event (a, b, c) E S. 

Prob(G = 3) = ^ Prob({s}) = ^ Prob({(a,b,c)}) 

s£S max{a,b,c}=3 

G(s)=3 a,b,c£{ 1 ,2, ... ,6} 




2.2 Fundamentals of Mathematics 



87 



= ^2 Prob({(3,b,c)}) + ^2 Prob({(a, 3, c)}) 

6,ce{l,2} a,ce{ 1,2} 

+ ^ Profr({(a, 6, 3)}) + ^ Pro&({(3, 3, a)}) 

a,6G{l,2} a€{l,2} 

+ ^2 Prob({(3, b, 3)}) + ^ Pro6({(a, 3, 3)}) 

6G{1,2} a,cG{l,2} 

+Pro6({(3,3,3)}) 

4 4 4 2 2 2 1 19 

~~ 216 + 216 + 216 + 216 + 216 + 216 + 216 _ 216' 

□ 



Thus, we have seen that one can define several random variables on the 
same sample space. 

Definition 2.2.5.16. Let S be a sample space with a probability distribution 
Prob, and let X and Y be two random variables on S. The joint probability 
density function of X and Y is the function fx,Y : IR x 1R — ► [0, 1] defined 
by 

fx,Y(x,y) = Prob(X = x and Y = y) = Prob(Event(X — x)C\Event(Y = y)). 

X and Y are independent if, for all x, y £ IR, 

Prob(X = x and Y = y) = Prob(X = x) • ProbfY — y). □ 

Note that the above definition of the independence of X and Y is a natural 
extension of the notion of the independence of two events. Obviously, 

Prob(X = x) = ^ Prob(X = x and Y = y), 



and 

Prob(Y = y) = Prob(X = x and Y — y). 

xGIR 

Applying the notion of conditional probabilities 



Prob{X = x\Y = ?/) 



Prob(X = x and Y — y) 
Prob(Y = y) 



Thus, if Event(X — x) and Event(Y — y) are independent, then Prob(X = 
x\Y = y) — Prob(X = x) and we obtain the definition of the independence of 
X and Y. 

The simplest and most useful characterization of the distribution of a 
random variable is the average of the values it takes on. This average value will 
be called the expected value in what follows. A good algorithmic motivation 
for the study of the expected value may be the relation to the analysis of the 
behavior and the time complexity of randomized algorithms. 




88 



2 Elementary Fundamentals 



Definition 2.2.5.17. Let S be a sample space with a probability distribution 
Prob, and let X be a random variable on S. The expected value (or ex- 
pectation of X) is 



E[X] = J2 X ' Pro K x = *) 

xElR 

if the sum is finite or converges absolutely. 

Exercise 2.2.5.18. Let S' be a sample space with a probability distribution 
Prob , and let X be an random variable on S. Prove that 

E[X]='£ x (s)-Prob({s}). 

ses □ 



Example 2.2.5.19. Consider the experiment of rolling one 6-sided dice, and 
the random variable F defined by F(a) = a for a E S = {1, 2, . . . , 6}. 



E\F } = £ F(a) ■ Prob(F = a) = ^ a • 1 

aes aeS 



1 

6 



aes 




21 

IT 



7 

2 b 



In what follows, a discrete random variable is called an indicator variable 
if it takes only values 0 and 1. An indicator variable X is used to denote the 
occurrence or nonoccurrence of an event E , where 

E = {s 6 S | X(s) = 1} and S - E = {s £ S \ X(s) = 0}. 

The above mentioned variable F, with F(C) = 1 if the run C of a randomized 
algorithm computes the right output, is an example of an indicator variable. 

If X is a random variable, and g a function from IR to 1R, then g(X) is a 
random variable, too. If the expectation of g(X) is defined, then clearly 21 



-%W] = S( x ) • Prob(X = x). 

x e ]R 



Particularly, if g(x) = r • X, then 



E\g(x)]=E[r-X]=r-E[X]. 



Observation 2.2.5.20. Let S be a sample space with a probability distribu- 
tion Prob , and let X and Y be two random variables. Then 



E[X + Y]=E[X] + E[Y]. 



□ 



The property of expectations presented in Observation 2.2.5.20 is called 
linearity of expectation. 



21 



See Exercise 2.2.5.18. 




2.2 Fundamentals of Mathematics 



89 



Theorem 2.2.5.21. Let S be a sample space with a probability distribution 
Prob. For any two independent random variables X and Y with defined E[X] 
and E[Y], respectively, 



E[X-Y] = E[X]-E[Y]. 



Proof 



E[X ■Y]='YjZ- Prob(X ■ Y = z) 

zeJR 

= EE x • y • Prob(X = x and Y = y) 

X y 

xyProb(X — x) • ProbiY — y) 

x y 

^TxProb{X = x) HE yProb(Y = y) 

x / \ y 

= E[X]-E[Y]. □ 

Definition 2.2.5.22. Let S be a sample space with a probability distribution 
Prob. Let X\, X<i, . . . , X n , n G IN + be random variables on S. We say that 
X\, X 2 , . . . , X n are mutually independent if, for all x\, X 2 , . . . , x n G IR ; 




Prob(X 1 = x\ and X 2 = X 2 and . . . and X n = x n ) = 

Prob(X 1 = xi) • Prob(X 2 = X 2 ) Prob{X n = x n ). 

Exercise 2.2.5.23. Prove the following generalization of Theorem 2.2.5.21. 
For any n random variables X\, X 2 , . . . , X n that are mutually independent, 

E[X x -X 2 X n ]= E[X!] • E[X 2 ] E[X n \. □ 

Example 2.2.5.24. Consider the experiment of consecutively rolling three 6- 
sided dices and the random variable F defined by F((a, b,c)) = 3a + 2b + c 
for every (a, b , c ) G S. We want to compute E[F] without working with the 
sum x ' Prob(F = x) consisting of 215 additions. We define three random 
variables F\, F 2 , and F3 as follows: 

Fi(a, b , c) = a, ^(a, 6, c) = b , and Fs(a , 6, c) = c. 

Obviously, F = 3F1+2F2+F3. Since E[Fi] = 7/2 for i = {1, 2, 3} as computed 
in the previous example, we obtain 

E[F\ = 3 • ElFi] + 2 • E[F 2 ] + E[F 3 ] = 6 • | = 21. □ 

The following two examples illustrate the usefulness of the notions ran- 
dom variable and expectation for practical purposes in algorithmics. The first 




90 



2 Elementary Fundamentals 



example shows how the investigation of E[X] of a randomized variable X can 
lead to the design of an efficient algorithm for a given task. The second exam- 
ple shows how to use these notions to analyze the complexity of a randomized 
algorithm. 

Example 2.2.5.25. Let F — F(x i, . . . , x n ) be a formula in conjunctive nor- 
mal form over a set {#i, . . . ,x n } of n variables. Our aim is to find an as- 
signment to {xi, . . . ,x n } such that as many as possible clauses are satisfied. 
Using a simple probabilistic consideration we show that there exists an input 
assignment that satisfies at least half of the clauses. 

Let F consist of m clauses, i.e., F = Fi AF 2 A- • - A F m . Suppose the following 
experiment. We choose the values of xi, X 2 , . . . , x n randomly with Prob(xi = 
1) = Prob(xi = 0) = 1/2 for i = l,2,...,n. Now, we define rri random 
variables Zi, . . . , Z m where, for i — 1, . . . , m, Zi(a) = 1 if Fi is satisfied by 
the assigned a and Zi(a) = 0 otherwise. For every clause of k distinct literals, 
the probability that it is not satisfied by a random variable assignment to the 
set of variables is 2~ fc , since this event takes place if and only if each literal 
gets the value 0, and the Boolean values are assigned independently to distinct 
literals in any clause. This implies that the probability that a clause with k 
literals is satisfied is at least 1 — 2~ k >1/2 for every k > 1, i.e., 

E[Zi] > 1/2 



for all i = 1, . . . , m. 

Now, we define a random variable ZasZ = YT= i Obviously, Z counts 
the number of satisfied clauses. Because of the linearity of expectation 



E[Z\ = E 






Y E -Y o 



i= 1 



i= 1 



m 

~2 



Thus, we are sure that there exists an assignment satisfying at least one half 
of the clauses of F. The following algorithm outputs an assignment whose 
expected number of satisfied clauses is at least m/2. 

Algorithm 2.2.5.26 (Random Assignment). 

Input: A formula F = F(x i, . . . ,x n ) over n variables xi , . . . ,x n in CNF. 

Step 1: Choose uniformly at random n Boolean values ai,...,a n and set 

Xi = di for i = 1 , . . . , n. 

Step 2: Evaluate each clause of F and set Z := the number of satisfied 

clauses. 

Output: (ai, a 2 , . . . , a n ), Z. 

□ 



Exercise 2.2.5.27. Let F be a formula in CNF whose every clause consists 
of at least k distinct variables, k > 2. Which lower bound on E[Z] can be 
proved in this case? □ 




2.2 Fundamentals of Mathematics 



91 



Example 2.2.5.28. Consider the task 22 of sorting a set S of n elements into 
an increasing order. One of the well-known recursive algorithms for this task 
is the following Randomized Quicksort RQS(S). 

Algorithm 2.2.5.29. RQS (S) 

Input: A set S of numbers. 

Step 1: Choose an element a uniformly at random from S. 

{every element in S has the probability ^ of being chosen} 
Step 2: S< := {b G S \b < a}] 

S y :={ceS\c>a}\ 

Step 3: output(RQS(S'<), a, RQS(5>)). 

Output: The sequence of elements of S in increasing order. 

The goal of this example is to show that the notions of random variables 
and expectation can be helpful to estimate the “average” (expected) complex- 
ity of this algorithm. As usual for sorting, the complexity is measured in the 
number of comparisons of pairs of elements of S. Let |5| = n for a positive 
integer n. 

We observe that the complexity of Step 2 is exactly |5| — 1 = n — 1. 
Intuitively, the best random choices of elements from S are choices dividing 
S into two approximately equal sized sets S < and S > . In the terminology of 
recursion this means that the original problems of the size n are reduced to 
two problems of size n/2. So, if T(n) denotes the complexity for this kind of 
choices, then 

T(n) < 2-T(n/2) + n- 1. 

Following Section 2.2.2 we already know that the solution of this recurrence 
is T(n) £ 0(n • logn). A very bad sequence of random choices is when the 
smallest element of the given set is always chosen. In this case the number of 
comparisons is 

r(n) = £i€0(n 2 ). 

i— 1 

Since one can still show that, for the recurrence inequality 
T(n)<Tg)+T^.nj+ n -l, 

T(n) e O(nlogn), RQS will behave well also if the size \S < \ of S < very 
roughly approximates |£>|. But this happens when the probability is at least 
1/2 because at least half of the elements are good choices. This is the reason 
for our hope that algorithm RQS behaves very well in the average. In what 
follows we carefully analyze the expected complexity of RQS. 



22 



One of the fundamental computing problems 




92 



2 Elementary Fundamentals 



Let si, 52 , . . . , s n be the output 23 of the algorithm RQS. Our experiment 
is the sequence of random choices of RQS. We define the random variable Xij 

by 



Xi. 




1 if Si and Sj are compared in the run C of RQS 
0 otherwise 



for alH, j G {1, . . . ,n}, i < j. Obviously, the random variable 



r = ££*; 

2=1 j>i 

counts the total number of comparisons. So, 



E[T\ = E 



n 



2= 1 j>i 



EE £ [^i 

i=l j>i 



(2.25) 



is the expected complexity of the algorithm RQS. 24 It remains to estimate 

E[XijY 

Let pij denote the probability that Si and Sj are compared in an execution. 
Since is either 1 or 0, 

L'[-^2j] Pij * 1 “h (1 Pij ) • 0 = Pij . 

Now, consider for every ij G {1, . . . , n}, i < j, the subsequence 



$i 1 ^2+ 1 1 • • • 5 ^2+J — 1 5 Sj ’ 



If some Sd with i < d < j was randomly chosen by RQS (5) before either Si or 
Sj has been randomly chosen, then Si and Sj were not compared. 25 If Si or Sj 
has been randomly chosen to play the splitting role before any of the elements 
from {sf+i, Si+ 2 , • • • , Si+j-i} have been randomly chosen, then Si and Sj were 
compared in the corresponding run 26 of RQS(S'). Since each of the elements 
of { Si , Si+i, . . . , Sj} is equal likely to be the first in the sequence of random 



choices, 



Pij = 



2 



j ~ i + 1 



(2.26) 



Inserting (2.26) into (2.25) we finally obtain 



2=1 j>i 

23 That is, si < S 2 < ■ • • < s n . 

24 Note that (2.25) holds because of the linearity of expectation. 

25 This is because s t G S< and Sj G S> according to d. 

26 In the run corresponding to this sequence of random choices. 




2.3 Fundamentals of Algorithmics 
2 



93 



EEj— 

i=l j>i ^ 



i + 1 



< 



< 



n n — i+l 

E E 



2=1 k = 1 



2 EE 



2 

jfc 

1 

k 



n 

= 2 ^ Har(n ) 

2=1 

= 2n • Har(n) « 2 • n • Inn + <9(n). 



Thus, as expected, the expected time complexity of RQS is in O(nlogn). □ 



Keywords introduced in Section 2.2.5 

sample space, event, probability distribution, conditional probability, random vari- 
able, probability density function, distribution function, expected value (expecta- 
tion), independence of events, independence of random variables, linearity of ex- 
pectation 



2.3 Fundamentals of Algorithmics 

2.3.1 Alphabets, Words, and Languages 

All data are represented as strings of symbols. The kind of data representation 
is often important for the efficiency of algorithm implementations. Here, we 
present some elementary fundamentals of formal language theory. We do not 
need to deal too much with details of data representation because we consider 
algorithms on an abstract design level and do not often work with the details 
of implementation. The main goal of this section is to give definitions of no- 
tions that are sufficient for fixing the representation of some input data and 
thus to precisely formalize the definitions of some fundamental algorithmic 
problems. We also need the terms defined here for the abstract considerations 
of the complexity theory in Section 2.3.3 and for proving lower bounds on 
polynomial-time inapproximability in Section 4.4.2. 

Definition 2. 3. 1.1. Any non-empty , finite set is called an alphabet. Every 
element of an alphabet E is called a symbol of E. 

An alphabet has the same meaning for algorithmics as for natural lan- 
guages - it is a collection of signs or symbols used in a more or less uniform 
fashion by a number of people in order to represent information and so to 




94 



2 Elementary Fundamentals 



communicate with each other. Thus, alphabets are used for communication 
between human and machine, between computers, and in algorithmic infor- 
mation processing. A symbol of an alphabet is often considered as a possible 
content of the computer word. Fixing an alphabet means to fix all possible 
computer words in this interpretation. Examples of alphabets are 

Ebool — {0, 1}, 

Elat = {a,6,c, ...,z}, 

E logic = {0} G )? 'A, V, 1 ') 

Definition 2.3. 1.2. Let £ be an alphabet. A word over £ is any finite se- 
quence of symbols of £. The empty word A is the only word consisting of 
zero symbols. The set of all words over the alphabet £ is denoted by £* . 

The interpretation of the notion “word over £” is a text consisting of 
symbols of £ rather than a term representing a notion. So, the contents of a 
book can be considered as a word over some alphabet including the symbol 
blank and symbols of £i a t- 

w = 0,l,0,0,l,0isa word over £booi . In what follows we usually omit the 
commas and represent w simply by 010010. So abcxyzef is a word over £i a t- 
For £ = {a, 6}, £* = {A, a, 6, aa, a6, 6a, 66, aaa, . . .}. 

Definition 2. 3. 1.3. The length of a word w over an alphabet £, denoted 
by \w\, is the number of symbols in w (i.e., the length of w as a sequence). 
For every word w G £* , and every symbol a G £, # a ( / iu) is the number of 
occurrences of the symbol a in the word w. 

For the word w = 010010, \w\ = 6, #o(w) = 4, and #i (w) = 2. We observe 
that for every alphabet £ and every word w G £*, 



M = £ #a(w). 

gl£_U 

Definition 2.3. 1.4. Let £ be an alphabet. Then, for any n G IN, 

£ n = {x e £*\ \x\ = n}. 

For instance, {a,6} 3 = {aaa, aa6, aba, baa, abb, bab, bba, 666}. We define 
£+= r* - {A}. 

Definition 2. 3. 1.5. Given two words v and w over an alphabet £, we define 
the concatenation of v and w, denoted by vw (or by v - w) as the word 
that consists of the symbols of v in the same order, followed by the symbols of 
w in the same order. 

For every word w G £* , we define 

(i) w° = A, and 

(ii) w n+1 = w • w n = ww n for every positive integer n. 




2.3 Fundamentals of Algorithmics 



95 



A prefix of a word w £ E* is any word v such that w — vu for some word 
u over E. A suffix of a word w E E* is any word u such that w — xu for 
some word x E E* . A subword of a word w over E is any word z £ E* such 
that w = uzv for some words u,v £ E* . 

The word abbcaa is the concatenation of the words ab and bcaa. The words 
abbcaa , a, ab, bca, and bbcaa are examples of subwords of abbcaa = ab 2 ca 2 . 
The words a, ab, ab 2 , ab 2 c, ab 2 ca, and ab 2 ca 2 are all prefixes of abbcaa. caa 
and a 2 are examples of suffixes of abbcaa. 

Exercise 2. 3. 1.6. Prove that, for every alphabet E, (E*,-), where • is the 
operation of concatenation, is a monoid. □ 

In what follows we use words to code data and so to represent input and 
output data, as well as the contents of the computer memory. Since the com- 
plexity of algorithms is measured according to the input length, the first step in 
the complexity analysis is to fix the alphabet and the data representation over 
this alphabet. This automatically determines the length of every input. We 
usually code integers as binary words. For every u = u n u n -i . . . u^ux E E™ ool , 
Ui E Ebool for i 1 , . . . , n, 



n 

Number (u) = ^ ui • 2 l ~ 1 

i= 1 

is the integer coded by u. Thus, for instance, Number( 000) = Number (0) — 0, 
and Number( 1101) = 1 • 2° + 0 • 2 1 + 1 • 2 2 + 1 • 2 3 = 1 + 0 + 4 + 8 = 13. 

One can use the alphabet {0, 1, #} to code graphs. If Mq = [ Q>ij]i,j=i,...,n 
is an adjacency matrix of a graph G of n vertices, then the word 

• • • a\ n fl z a2ia22 • • • • • • ^&nl^n2 • • • 

can be used to code G. 

Exercise 2. 3. 1.7. Design a representation of graphs by words over Ei 00 i. □ 

Exercise 2. 3. 1.8. Design a representation of weighted graphs, where weights 
are some positive integers, using the alphabet {0, 1, #}. □ 

One can use the alphabet Ei ogic to represent formulae over a variable set 
X = {x\, X 2 , xs , . . .} and operations V, A, and Since we have infinitely many 
variables, we cannot use symbols xi as symbols of the alphabet. We code a 
variable Xj by xbin(j), where bin(j) is the shortest word 27 over E^ooi such 
that Number (bin (j)) = j, and x is a symbol of Ei ogic . Then, the code of a 
formula ^ can be obtained by simply exchanging xi with xbin(i) for every 
occurrence of Xi in <P. For instance, the formula 

27 Thus, the first (most significant) bit of bin(j) is 1. 




96 



2 Elementary Fundamentals 



^=(xiVi 4 V xj) A (x 2 V Xi) A (X 4 A xs) 
is represented by the word 

= ( xl V -i(xlOO) V xlll) A (xlO V -.(xl)) A (xlOO A -.(xlOOO)) 
over Ei og%c = {0, 1, (, ), A, V, -i, x}. 

Definition 2.3. 1.9. Let E be an alphabet . Every set L C E* is called a lan- 
guage over E. The complement of the language L according to E is 
L C = E* — L. 

Let E\ and E 2 be alphabets, and let Li C E{ and L 2 C E% be languages. 
The concatenation of L 1 and L 2 is 

L 1 L 2 — L\ o L 2 — {uv G (Ei U Z 2 )* \ ueLi and v G L 2 }. 

0, {A}, {a, 6 }, {a, 6 }*, {a 6 , bba, b 10 a 20 }, {a n b 2n \ n G IN} are examples of 
languages over (a, 6 }. Observe that L-0 = 0-L = 0 and L-{A} = {A}-L = L for 
every language L. U = {1} • {0, 1}* is the language of binary representations 
of all positive integers. 

A language can be used to describe a set of consistent input instances of 
an algorithmic problem. For instance, the set of all representations of formu- 
lae in CNF as a set of words over Ei ogic or the set | Ui G 

(0, l} m ,ra G IN} as the set of representations of all directed graphs over 
(0, 1, #} are examples of such languages. But words can also be used to code 
programs and so one can consider the language of codes of all correct programs 
in a given programming language. Languages can be also used to describe so- 
called decision problems, but this is the topic of the next section. 

The last definition of this section shows how one can define a linear ordering 
on words over some alphabet E , provided one has a linear ordering on the 
symbols of E. 

Definition 2.3.1.10. Let E = {si, s 2 , . . . , s m }, m > 1, be an alphabet, and 
let si < S 2 < • • • < Sm be a linear ordering on E. We define the canonical 
ordering on A7* as follows. For all u,v G E* , 

u < v if \u\ < |u| 

or \u\ = \v\,u = xS{U f , and v = xSjV f 
for some x , v! , v' G E * , and i < j. 

Keywords introduced in Section 2.3.1 

alphabet, symbol, word, empty word, the length of a word, concatenation, prefix, 
suffix, subword, binary representation of integers, language, the complement of a 
language, canonical ordering of words 




2.3 Fundamentals of Algor it hmics 



97 



2.3.2 Algorithmic Problems 

Thousands of algorithmic problems classified according to different points of 
view are considered in the literature on algorit hmics. In this book we deal 
with hard problems only. We consider a problem to be hard if there is no 
known deterministic algorithm (computer program) that solves it efficiently. 
Efficiently means in a low-degree polynomial time. Our interpretation of hard- 
ness here is connected to the current state of our knowledge in algorithmics 
rather than to the unknown, real difficulty of the problems considered. Thus, 
a problem is hard if one would need years or thousands of years to solve it 
by deterministic programs for an input of a realistic size appearing in the 
current practice. This book provides a handbook of algorithmic methods that 
attack hard problems. Thousands of problems of great practical relevance are 
very hard from this point of view. Fortunately, we do not need to define and 
to consider all of them. There are some crucial, paradigmatic problems such 
as the traveling salesperson problem, linear (integer) programming, set cover 
problem, knapsack problem, satisfiability problem, and primality testing that 
are pattern problems in the sense that solving most of the hard problems can 
be reduced to solving some of the paradigmatic problems. 

The goal of this chapter is to define some of these fundamental pattern 
problems. The methods for solving them are the topic of the next chapters. 
Every algorithm (computer program) can be viewed as an execution of a 
mapping from a subset of El to E% for some alphabets E\ and E^- So, every 
(algorithmic) problem can be considered as a function from El to E% or as 
a relation on E{ x E% for some alphabets E 1 and Z 2 . We usually do not 
need to work with this kind of formalism because we consider two classes of 
problems only - decision problems (to decide for a given input whether it has 
a presqribed property) and optimization problems (to find the “best” solution 
from the set of solutions determined by some constraints). In what follows we 
define the fundamental problems that will be the objects of the algorithmic 
design in the subsequent chapters. We start with decision problems. If A is 
an algorithm and x is an input, then A(x ) denotes the output of A for the 
input x. 

Definition 2.3.2. 1. A decision problem is a triple (L, U, E) where E is 
an alphabet and L C U C E* . An algorithm A solves (^decides,) the decision 
problem (L,U,E) if, for every x £ U, 

(i) A(x) = 1 if x G L, and 

(ii) A(x) = 0 if x e U — L (x £ L). 

We see that any algorithm A solving a decision problem (L, U, E) computes 
a function from U to {0, 1}. The output “1” is interpreted as the answer “yes” 
to the question whether a given input belongs to L (whether the input has 
the property corresponding to the specification of the language L), and the 
output “0” is equivalent to the answer “no”. 




98 



2 Elementary Fundamentals 



An equivalent form of a description of a decision problem is the following 
form that specifies the input-output behavior. 

Problem (L, U, E) 

Input: An x E U. 

Output: "yes” if x E L, 

“no” otherwise. 

For many decision problems (L, U, E) we assume U = E*. In that case we 
shall use the short notation (L, E) instead of (L, Z*, Z). 

Next we present the fundamental decision problems that will be studied 
in Chapter 5. 

PRIMALITY TESTING. 

Informally, primality testing is to decide, for a given positive integer, whether 
it is prime or not. Thus, primality testing is a decision problem (Prim, E^ooi ), 
where 

Prim = {w E {0, 1}* | Number{w) is a prime}. 

Another description of this problem is 

Primality testing 

Input: An x e £* bool 

Output: “yes” if Number(x) is a prime, 

"no” otherwise. 

One can easily observe that primality testing can also be considered for 
other integer representations. Using Ek = {0, 1, 2, . . . , k — 1} and the £;-ary 
representation of integers we obtain (Prim*;, Ek), where 

Prim*, = {x E Z£ | x is the k- ary representation of a prime}. 

From the point of view of computational hardness it is not essential whether 
we consider (Prim, E^ooi) or (Prim*,, Ek) for some constant k because one 
has efficient algorithms for transferring any fc-ary representation of an integer 
to its binary representation, and vice versa. But this does not mean that 
the representation of integers does not matter for primality testing. If one 
represents an integer n as 

#6m(pi)#tan(p 2 )# • • • #bin(pi) 

over {0, 1, #}, where n = pi • p 2 pi and pis are the nontrivial prime 

factors of n, then the problem of primality testing becomes easy. This sensi- 
bility of hardness of algorithmic problems according to the representation of 
their inputs is sometimes the reason for taking an exact formal description of 
the problem that fixes the data representation, too. For primality testing we 
always consider (Prim, Eb 00 i) as the formal definition of this decision problem. 




2.3 Fundamentals of Algorithmics 



99 



EQUIVALENCE PROBLEM FOR POLYNOMIALS. 

The problem is to decide, for a given prime p and two polynomials pi(xi , . . . , 
Xm) and p 2 (xi, • • • , x m ) over the field 2Z V , whether p\ and P 2 are equivalent, 
i.e., whether p\(x\, . . . , x m ) — £> 2 (^ 1 , • • • , £ m ) is identical 0. The crucial point 
is that the polynomials are not necessarily given in a normal form such as 

a 0 + a\X\ + a 2 x 2 + ai 2 xix 2 + a\x\ + a 2 x 2 H 

but in an arbitrary form such as 

(xi + 3x 2 ) 2 • (2xi + 4x 4 ) • x\. 

A normal form may be exponentially long in the length of another represen- 
tation and so the obvious way to compare two polynomials by transferring 
them to their normal forms and comparing their coefficients is not efficient. 

We omit the formal definition of the representation of polynomials in an 
arbitrary form over the alphabet U po i = {0, 1, (, ), exp, +, •}, because it can 
be done in a similar way as how one represents formulae over E i ogic . The 
equivalence problem for polynomials can be defined as follows. 

EQ-POL 

Input: A prime p , two polynomials p\ and p 2 over variables from X = 

{*1,3:2, . . .}. 

Output: “yes" if p\ = p 2 in the field 
“no” otherwise. 

EQUIVALENCE PROBLEM FOR ONE-TIME-ONLY BRANCHING 
PROGRAMS. 

The equivalence problem for one-time-only branching programs, Eq-IBP, 
is to decide, for two given one-time-only branching programs B\ and B 2 , 
whether B\ and B 2 represent the same Boolean function. One can represent 
a branching program in a similar way as a directed weighted graph 28 and so 
we omit the formal description of branching program representation. 29 

EQ-IBP 

Input: One-time-only branching program B\ and B 2 over a set of Boolean 

variables X = {aq, x 2 , X3, . . .}. 

Output: “yes” if B\ and B 2 are equivalent (represent the same Boolean 
function), 

“no” otherwise. 

28 Where not only the edges have some labels, but also the vertices are labeled. 

29 Remember that the formal definition of branching programs was given in Sec- 
tion 2.2.3 (Definitions 2.2.3.19, 2.2.3.20, Figure 2.11). 




100 2 Elementary Fundamentals 



SATISFIABILITY PROBLEM. 

The satisfiability problem is to decide, for a given formula in the CNF, whether 
it is satisfiable or not. Thus, the satisfiability problem is the decision prob- 
lem (Sat, E iog % c), where 

Sat = {w £ ZJfo gic | w is a code of a satisfiable formula in CNF}. 

We also consider specific subproblems of Sat where the length of clauses of 
the formulae in CNF is bounded. For every positive integer k > 2, we define 
the ^-satisfiability problem as the decision problem (&Sat, Ui ogic ), where 

fcSAT = {w £ ^togic I w is a code of a satisfiable formula in /cCNF}. 

In what follows we define some decidability problems from graph theory. 

CLIQUE PROBLEM. 

The clique problem is to decide, for a given graph G and a positive integer k, 
whether G contains a clique of size k (i.e., whether the complete graph K & of 
k vertices is a subgraph of G). Formally, the clique problem is the decision 
problem (Clique, (0, 1, #}), where 



Clique = {x#w £ (0, 1, #}* | x £ (0, 1}* and w represents a graph 

that contains a clique of size Number (x)}. 

An equivalent description of the clique problem is the following one. 

Clique Problem 

Input: A positive integer k and a graph G. 

Output: “yes” if G contains a clique of size k, 

"no” otherwise. 

VERTEX COVER PROBLEM. 

The vertex cover problem is to decide, for a given graph G and a positive 
integer fc, whether G contains a vertex cover of cardinality k. Remember that 
a vertex cover of G — (V, E) is any set S of vertices of G such that each edge 
from E is incident to at least one vertex in S. 

Formally, the vertex cover problem (VCP) is the decision problem 
(VCP, {0,1,#}), where 



VCP = {u#w £ {0, 1, #} + | u £ {0, 1} + and w represents a graph that 

contains a vertex cover of size Number (u)}. 




2.3 Fundamentals of Algorithmics 101 



HAMILTONIAN CYCLE PROBLEM. 

The Hamiltonian cycle problem is to determine, for a given graph G, whether 
G contains a Hamiltonian cycle or not. Remember that a Hamiltonian cycle 
of G of n vertices is a cycle of length n in G that contains every vertex of G. 

Formally, the Hamiltonian cycle problem (HC) is the decision problem 
(HC, {0,1,#}), where 

HC = {w G {0, 1, #}* | w represents a graph that 

contains a Hamiltonian cycle}. 

EXISTENCE PROBLEMS IN LINEAR PROGRAMMING. 

Here, we consider problems of deciding whether a given system of linear equa- 
tions has a solution. Following the notation of Section 2.2.1 a system of linear 
equations is given by the equality 

A- X = b, 

where A = is an m x n matrix, X = (aq, # 2 , • • • , x n ) T , 

and b = (fri,...,fr m ) T is an ra- dimensional column vector. The n elements 
xi,X 2 , . • . ,x n of X are called unknowns (variables). In what follows we con- 
sider that all elements of A and b are integers. Remember, that 

Sol(A, b) = {X C IR n | A • X = b} 

denotes the set of all real- valued solutions of the linear equations system A • 
X = b. In what follows we are interested in deciding whether Sol(A,b) is 
empty or not (i.e., whether there exist a solution to A X — b) for given 
A and b. More precisely, we consider several specific decision problems by 
restricting the set Sol (A , b) to subsets of solutions over 7Z n or {0, l} n only, or 
even considering the linear equations over some finite fields instead of IR. Let 

Sol s (A , b) = {X C S n | A • A = b} 

for any subset S' of IR. 

First of all observe that the problem of deciding whether Sol{A,b ) = 0 
is one of the fundamental tasks of linear algebra and that it can be solved 
efficiently. The situation essentially changes if one searches for integer solutions 
or Boolean solutions. Let (A, b) denote a representation of a matrix A and a 
vector b over the alphabet {0,1,#}, assuming all elements of A and b are 
integers. 

The problem of the existence of a solution of linear integer pro- 
gramming is to decide whether Sol%(A, b) = 0 for given A and b. Formally, 
this decision problem is (Sol-IP, {0, 1, #}), where 

Sol-IP = {(A, b) e {0, 1, #}* | Sol%(A, b) # 0}. 




102 2 Elementary Fundamentals 



The problem of the existence of a solution of 0/1-linear program- 
ming is to decide whether Sol^ 0jl y(A, b) = 0 for given A and b. Formally, this 
decision problem is (Sol- 0/I-IP, {0, 1, #}), where 

Sol-O/I-IP = {(A,b) e {0, 1,#}* I Sol {0A} (A,b) ± 0}. 

All existence problems mentioned above consider computing over the field 
1R. We are interested in solving the system of linear equations A • X = b 
over a finite field Z p for a prime p. So, all elements of A and b are from 
Z p — {0,1 , . . . ,p — 1}, all solutions have to be from (Z5 p ) n , and the linear 
equations are congruences modulo p (i.e., the operation of addition is ® mo d P 
and the operation of multiplication is 0 mo d P )- 

The problem of the existence of a solution of linear programming 
modulo p is the decision problem (Sol-IP p , {0, 1, ... ,p — 1, #}) where 

SoL-IPp = {(A, b) £ {0, 1, . . . ,p — 1, #}* | if A is an m x n matrix 
over 7Z P , m, n £ IN — {0}, and b £ 2Z™ , then there 
exists X £ (Tip) 71 such that AX = b (mod p)}. 

In what follows, we define some fundamental optimization problems. We 
start with a general framework that describes the formalism for the specifica- 
tion of optimization problems. 

Roughly, a problem instance x of an optimization problem specifies a set 
of constraints. These constraints unambiguously determine the set M(x) of 
feasible solutions for the problem instance x. Note that Ai(x) may be empty 
or infinite. The objective, determined by the specification of the problem, is 
to find a solution from A4(x) that is the “best” one among all solutions in 
M(x). Note that there may exist several (even infinitely many) best solutions 
among the solutions in M(x). 

Definition 2. 3. 2. 2. An optimization problem is a 7 -tuple U = (Z7, Z’o, T, 
Li,M, cost , goal), where 

(i) Ei is an alphabet, called the input alphabet ofU, 

(ii) Eo is an alphabet, called the output alphabet ofU, 

(Hi) L C E | is the language of feasible problem instances, 

(iv) Li CL is the language of the (actual) problem instances of U, 

(v) M is a function from L to Pot ( Eq ), 30 and, for every x £ L, M(x) is 
called the set of feasible solutions for x, 

(vi) cost is the cost function that, for every pair (u,x), where u £ M(x) for 
some x £ L, assigns a positive real number cost(u,x), 

(vii) goal £ { minimum , maximum}. 

30 Remember that Pot(S) is the set of all subsets of the set 5, i.e., the power set of 

5 . 




2.3 Fundamentals of Algorithmics 103 

For every x E Lj, a feasible solution y E A4(x ) is called optimal for x and 
U if 

cost(y , x ) = goal{cost(z , x) | z E Ad(x)}. 

For an optimal solution y E A4(x), we denote cost(y,x ) by Optjj(x). U is 
called a maximization problem if goal = maximum, and U is a minimiza- 
tion problem if goal = minimum. In what follows Outputu (x)C. J\A(x) 
denotes the set of all optimal solutions for the instance x of U. 

An algorithm A is consistent for U if for every x E Li, the output 
A(x) E M(x). We say that an algorithm B solves the optimization problem 
U if 

(i) B is consistent for U , and 

(ii) for every x E Lj , B(x) is an optimal solution for x and U . 

Let us explain the informal meaning of the formal definition of an opti- 
mization problem U as a 7-tuple (Bi, Z’o, L, Lj, A4, cost, goal). B\ has the 
same meaning as the alphabet of decision problems and it is used to code 
(represent) the inputs. Similarly, Bo is the alphabet used to code outputs. 
On the level of algorithm design used here we usually do not need to specify 
Bj or Bo and the coding of inputs and outputs, because these details do not 
have any essential influence on the hardness of the problems considered. But 
this formal specification may be useful in the classification of the optimiza- 
tion problems according to their computational difficulty, and especially for 
proving lower bounds on their polynomial-time approximability. 

The language L is the set of codes of all problem instances (inputs) for 
which U is well defined. Lj is the set of actual problem instances (inputs) and 
one measures the computational hardness of U according to inputs of Lj. In 
general, one can simplify the definition of U by removing L and the definition 
will work as well as Definition 2. 3. 2. 2. The reason to put this additional in- 
formation into the definition of optimization problems is that the hardness of 
many optimization problems is very sensible according to the specification of 
the set of considered problem instances (Lj). Definition 2. 3. 2. 2 enables one to 
conveniently measure the increase or decrease of the hardness of optimization 
problems according to the changes of Lj by a fixed L. 

Definition 2. 3. 2. 3. Let U\ — (Bi, Bo, L, Li,i, M., cost, goal) and U 2 = ( Bj , 
Bo , L, Fj ? 2 , AA, cost, goal) be two optimization problems. We say that U\ is a 
subproblem 0 /I /2 if Lp 1 C L/ ? 2 * 

The function M is determined by the constraints given by the problem in- 
stances and M(x) is the set of all objects (solutions) satisfying the constraints 
given by x. The cost function assigns the cost cost(a,x) to every solution a 
from Ai(x). If the input instance x is fixed, we often use the short notion 
cost(ct) instead of cost(a,x). If goal = minimum [= maximum], then an 
optimal solution is any solution from M(x) with the minimal [maximal] cost. 




104 2 Elementary Fundamentals 



To make the definitions of specific optimization problems transparent, we 
often leave out the specification of coding the data over Uj and U o- We define 
the problems simply by specifying 

• the set of actual problem instances Lj, 

• the constraints given by the input instances, and so M(x) for every x G Lj, 

• the cost function, 

• the goal. 

TRAVELING SALESPERSON PROBLEM. 

Traveling salesperson problem is the problem of finding a Hamiltonian cycle 
(tour) of the minimal cost in a complete weighted graph. The formal definition 
follows. 

Traveling Salesperson Problem (TSP) 

Input: A weighted complete graph (G, c), where G — ( V,E ) and c : E — > 

IN. Let V — {fi, . . . , v n } for some n G IN — {0}. 

Constraints: For every input instance (G, c), A i(G,c) — {v ^ , Vi 2 , . . . , Vi n , 
Vh | (H, ^ 2 , • • • , i n ) ^ a permutation of (1,2,..., n)}, i.e., the set of 
all Hamiltonian cycles of G. 

Costs: For every Hamiltonian cycle H = . . . Vi n Vi x G M. (G, c), 

cost(( Vil ,v i2 ,...v in , v h ), ( G , c)) = £”= 1 c({Vi 3 , V kj mod n)+1 }), 
i.e., the cost of every Hamiltonian cycle H is the sum of the weights 
of all edges of H. 

Goal: minimum. 

If one wants to specify £i and U o one can take {0, 1,#} for both. The 
input can be a code of the adjacency matrix of (G, c) and the Hamiltonian 
paths can be coded as permutations of the set of vertices. 

The following adjacency matrix represents the problem instance of TSP 
depicted in Figure 2.12. 

/0 1 1 3 8\ 

10 2 12 
1 2 0 7 1 
3 1701 
\82110/ 

Observe that there are 4!/2 = 12 Hamiltonian tours in K$. The cost of the 
Hamiltonian tour H = v i, ^ 3 , ^4, ^ 5 , is 

cost(H) = c({ui,u 2 }) + c({i;2,V3}) + c({v 3 ,M) + C({U4,1> 5 }) +c({v 5 ,vi}) 

= l + 2 + 7+l + 8 = 19. 

The unique optimal Hamiltonian tour is 



H 0p t = vi,v 2 ,V4,v 5 ,v 3 ,v 1 with cost(H 0p t) = 5. 




2.3 Fundamentals of Algorithmics 105 




Fig. 2.12. 



Now we define two subproblems of TSP. 

The metric traveling salesperson problem, A-TSP, is a subproblem 
of TSP such that every problem instance (G, c) of A-TSP satisfies the triangle 
inequality 

c({u, v}) < c({u, w}) + c({w , u}) 
for all vertices u,w,v of G. 

The problem instance depicted in Figure 2.12 does not satisfy the triangle 
inequality because 

7 = c({u 3 , v 4 }) > c({u 3 , u 5 }) + c({v 3 ,v 4 }) = 1 + 1 = 2. 

The geometrical traveling salesperson problem (Euclidean TSP) 

is a subproblem of TSP such that, for every problem instance (G, c) of TSP, 
the vertices of G can be embedded in the two-dimensional Euclidean space 
in such a way that c({u, u}) is the Euclidean distance between the points 
assigned to the vertices u and v for all u,v of G. A simplified specification 
of the set of input instances of the geometric TSP is to say that the input is 
a set of points in the plane and the cost of the connection between any two 
points is defined by their Euclidean distance. 

Since the two-dimensional Euclidean space is a metric space, the Euclidean 
distance satisfies the triangle inequality and so the geometrical TSP is a sub- 
problem of A-TSP. 

MAKESPAN SCHEDULING PROBLEM. 

The problem of makespan scheduling (MS) is to schedule n jobs with desig- 
nated processing times on m identical machines in such a way that the whole 
processing time is minimized. Formally, we define MS as follows. 




106 2 Elementary Fundamentals 



Makespan Scheduling Problem (MS) 

Input: Positive integers pi,p 2 , • • • ,Pn and an integer m > 2 for some n G 

IN — {0}. 

{pi is the processing time of the ith job on any of the m available 
machines}. 

Constraints: For every input instance (pi, . . . ,p n , m) of MS, 

A4(pi,...,p n ,ra) = {5i,5 2 ,...,5 m |5* C {1,2, ...,n} for i = 
\Jk=i S k = (1,2,..., n}, and Si D Sj = 0 for i ^ j}. 
{M{p\, . . . ,p n ,ra) contains all partitions of {l,2,...,n} into m 
subsets. The meaning of (Si, S^, . . . , S m ) is that, for i = 1, . . . , m, 
the jobs with indices from Si have to be processed on the zth 
machine}. 

Costs: For each (Si, S 2 , . . . , S m ) G M(p\, . . . ,p n ,m), 

co5^((Si, . . . , S m ), (pi, . . . ,p n , m)) = ma x{th eSt Pi \ i = l,-..,m}. 
Goal: minimum. 

An example of scheduling seven jobs with the processing times 3, 2, 4, 1, 

3, 3, 6, respectively, on 4 machines is given in Figure 4.1 in Section 4.2.1. 

COVER PROBLEMS. 

Here, we define the minimum vertex cover problem 31 (Min-VCP), its weighted 
version, and the set cover problem (SCP). The minimum vertex cover problem 
is to cover all edges of a given graph G with a minimal number of vertices of 

G. 

Minimum Vertex Cover Problem (MIN-VCP) 

Input: A graph G = (V, E ). 

Constraints: A4(G) = {S C V \ every edge of E is incident to at least one 
vertex of S}. 

Cost: For every S G M(G), cost(S , G) = \S\. 

Goal: minimum. 

Consider the graph G given in Figure 2.13. 



M(G) = {{vi,V 2 ,V3,V4,V 5 },{vi,V2,V3,V i },{vi,V2,V3,V 5 },{v 1 ,V 2 ,V 4 ,V 5 } 

{vi,Vz,V i ,V' > },{v2,Vz,V i ,V h }, {v\,Vz,Vi}, {V 2 ,V4,V 5 }, {v 2 ,v 3 ,v 5 }}. 

The optimal solutions are {i>i,i> 3 ,t> 4 }, {v 2 ,V 4 ,v 3 }, and {v 2 ,v 3 ,V 5 } and so 
Optycp(G) = 3. Observe that there is no vertex cover of cardinality 2 because 



31 



Observe that we have two versions of vertex cover problems. One version is the 
decision problem defined by the language VCP above and the second version 
Min-VCP is the minimization problem considered here. 




2.3 Fundamentals of Algorithmics 107 



vi 




to cover the edges of the cycle vi, V 2 , ^ 3 , U 4 , U 5 , v\ one needs at least three 
vertices. 

Set Cover Problem (SCP) 

Input: (X, E), where X is a finite set and E C Pot(X) such that X = 

Usgjf S - 

Constraints: For every input (X, E), 

M(X,X) = {CCX\X = U SeC S}. 

Costs: For every G G M(X, X), cost(C , (X, X)) = |G|. 

Goal: minimum. 

Later we shall observe that Min-VCP can be viewed as a special subprob- 
lem of SCP because, for a given graph G — (V, E), one can assign the set S v 
of all edges adjacent to v to every vertex v of G. For the graph in Figure 2.13 
it results in the instance (E, E) of SCP where 

T = {S V1 , Sy 2 , Sy 3 , Sy 4 , Sy 5 } , 

S vi = {{^1,^2}, {Ui,^ 5 }}, S V2 = {{VUV 2 },{V 2 ,V$},{V 2 ,V 4 }}, 

s v 3 = {{^3,^2}, {^3,^5}, {^3,^4}}, Sy 4 = {{^ 3 , M, {^2, ^4}, {^4, ^5}}, and 

Sv 5 = {{^1,^5}, {^ 3 , ^5}, W, ^5}}. 

The last cover problem that we consider is the weighted generalization of 
Min-VCP. 

Weighted Minimum Vertex Cover Problem (WEIGHT-VCP) 

Input: A weighted graph G = (V, E, c), c : V — » IN — {0}. 

Constraints: For every input instance G = (V, E, c), 

M(G) = {5 C V | S' is a vertex cover of G}. 

Cost: For every S G M(G), G = (V, E, c), 

cosi(5, (V, E, c)) = c(v). 

Goal: minimum. 




108 2 Elementary Fundamentals 



MAXIMUM CLIQUE PROBLEM. 

The maximum clique problem (Max-CL) is to find a clique of the maximal 
size in a given graph G. 

Maximum Clique Problem (Max-CL) 

Input: A graph G = (V, E ) 

Constraints: M(G) = {S C V | {{rq v } | u, v G S', u ^ v} C E}. 

{M(G) contains ail complete subgraphs (cliques) of G} 

Costs: For every S G M(G), cost{S,G) = | 5 |. 

Goal: maximum. 

To present a specific input instance consider the graph G depicted in Figure 
2.13. 

M(G) = {v 2 }, {v 3 }, {^4}, {^5}, 

{vi,v 2 }, {vi, v 5 }, {^2, V3}, {^ 2 , V4}, {^ 3 , V 4 }, {^ 3 , ^5}, {^ 4 , V5}, 

{^2, V3, V4}, {V3, V 4 ,V 5 }}. 

The optimal solutions are {v 2 ,vs, v 4 } and {^3, t> 4 , ^5} and so O^max-cl(G) = 

3. 

CUT PROBLEMS. 

We introduce the maximum cut problem (Max- Cut) and the minimum cut 
problem (Min-Cut). Remember that a cut of a graph G = (V, E) is any 
partition of V into (Vi, V 2 ) such that V\ U V 2 = V and Vi nV 2 = 0 . 

Maximum Cut Problem (Max-Cut) 

Input: A graph G = (V, E). 

Constraints: 

M(G) = {(Vi, V 2 ) | Vx U V 2 = V, Vl ^ 0 / V 2 , andVi n V 2 = 0 }. 
Costs: For every cut (Vi, V 2 ) G M{G) } 

cost((Vi,V 2 ),G) = \ED {{u,v} I u € Vi,v e v 2 }\. 

Goal: maximum. 

The minimum cut problem (Min- Cut) can be defined in the same way 
as Max- Cut. The only difference is that the goal of Min- Cut is minimum. 

The only optimal solution of Min- Cut for the graph G in Figure 2.13 is 
({^i}, {v 2 , ^3, v 4 , U5 }) , and the optimal solutions of Max-Cut for the graph 
Gare ({ni,n 2 ,n 3 }, (Ui, v 2 , n 5 }, {>3, M)> and ({^1,^4, ^5}, (v 2 , ^3>)- 

So, Opt M m-CvT(G) = 2 and Op<max-Cut(G) = 4. 




2.3 Fundamentals of Algorithmics 109 



KNAPSACK PROBLEM. 

First, we define the simple knapsack problem(SKP). This optimization task 
can be described as follows. One has a knapsack whose weight capacity is 
bounded by a positive integer b (for instance, by b pounds) and n objects of 
weights rci, 1^2 , . . . , rc n , n G IN — {0}. The aim is to pack some objects in the 
knapsack in such a way that the contents of the knapsack are as heavy as 
possible but not above b. 

Simple Knapsack Problem (SKP) 

Input: A positive integer b, and positive integers Wi, u>2 , . . . , w n for some 

n G IN — {0}. 

Constraints: M(b, Wi,W 2 , . . . , w n ) = {T C { 1 , . . . , n} \ ^2 ieT < b }, 

i.e., a feasible solution for the problem instance b, wi,W 2 , . . . ,w n is 
every set of objects whose common weight does not exceed b. 

Costs: For each T G M(b, wi, W 2 , . . . , w n ), 

cost(T,b,wi,W 2 , • • • ,w n ) = T Wi. 

Goal: maximum . 

For the problem instance I — (6, w \, . . . , u^), where b = 29, w\ = 3, ui 2 = 
6, ws = 8, W 4 — 7, w§ = 12, the only optimal solution is T = {1, 2, 3, 5} with 
cost (T, I) = 29. If one considers the problem instance V = (&', ici, . . . , ^5) 
with b f = 14, then the optimal solution is T f — {2,3} with cost(T f ,V) — 14. 

The instances of the general knapsack problem contain additionally a cost 
Ci for every object i. The objective is to maximize the common cost of objects 
packed into the knapsack 32 by satisfying the constraint b on the weight of the 
knapsack. 

Knapsack Problem (KP) 

Input: A positive integer b, and 2 n positive integers rci, u>2 , . . . , w n , c 1 , 

C 2 , . . . , c n for some n G IN — {0}. 

Constraints: 

M(b,wi, . . . ,w n ,ci, . . . ,c n ) = {TC{l,...,n}| J2ieT w i ^ b }- 
Costs: For each T G A4(b, w 1 , . . . , w n , ci, . . . , c n ), 

cost(T,b,wi, . . . ,w n ,ci, . . . ,c n ) = Y 

Goal: maximum. 

Consider the problem instance / determined by b — 59, u>i = 12, c\ = 9, 
W 2 = 5, C2 = 4, = 13, C3 = 5, W 4 = 18, c 4 = 9, = 15, c 5 = 9, wq = 29, 

cq = 22. The optimal solution is T — {1,5,6}. Observe that T satisfies the 
constraint because w\ + + wq = 12 + 15 + 29 = 56 < 59 = b and that 

OptKp(I) ~ ci + C5 + ce = 9 + 9 + 22 = 40. 

32 Rather than their weights. 




110 2 Elementary Fundamentals 

BIN-PACKING PROBLEM. 

The bin-packing problem (Bin-P) is similar to the knapsack problem. One 
has n objects of rational weights w\, . . . , w n G [0,1]. The goal is to distribute 
them among the knapsacks (bins) of unit size 1 in such a way that a minimal 
number of knapsacks (bins) is used. 

Bin-Packing Problem (BIN-P) 

Input: n rational numbers wi, w 2 , . . . , w n G [0, 1] for some positive integer 

n. 

Constraints: M (zui, . . . , w n ) = {S C (0, l} n | for every s G S, 
s T -(wi,w 2 ,...,w n ) < 1, and E se s s = ( x > !> • • • - !)}• 

{If S = {«i,S 2 , • • • ,s m }, then = (sa, s i2 , . . . , s in ) determines 
the set of objects packed in the ith bin. The j th object is packed into 
the zth bin if and only if Sij = 1. The constraint 

s['(wiv,Wn) < 1 

assures that the ith bin is not overfilled. The constraint 

ses 

assures that every object is packed in exactly one bin.} 

Cost: For every S G M(wi,u> 2 , . . . , w n ), 

cost(S , (wi ,. . .,w n )) = IS'l. 

Goal: minimum. 

Observe that an alternative way to describe the constraints of Bin-P is to 
take 



M(w!, ...,w n ) = {(Ti,T 2 , . . . ,T m ) | m G IN - (0},T; C {1,2, . . . ,n} 
for i = 1, ... ,n,Ti (iTj = 0 for i ^ j, 

m 

U Ti = (1,2 and 

i = 1 

^2 w k < 1 for j = 1, . . . , m}. 

keTj 

MAXIMUM SATISFIABILITY PROBLEM. 

The general maximum satisfiability problem (Max- Sat) is to find an assign- 
ment to the variables of a formula $ such that the number of satisfied clauses 
is maximized. 




2.3 Fundamentals of Algorithmics 111 



Maximum Satisfiability Problem (Max-Sat) 

Input: A formula ^ = Fi A A • • • A F m over X = {xi,X2 , . . .} in CNF 

(an equivalent description of this instance of Max-Sat is to consider 
the set of clauses F\, F2 , . . . , F m ). 

Constraints: For every formula over the set {aq, . . . , x n } C I,nG IN— {0}, 
M($) = {0, l} n . 

{Every assignment of values to {xi, . . . , x n } is a feasible solution, 
i.e., M{$) can also be written as {a \ a : X — > {0, 1}}. 

Costs: For every in CNF, and every a E A4(<P), 

cost(a,<P) is the number of clauses satisfied by a. 

Goal: maximum. 

Observe that, if is a satisfiable formula, an optimal solution is any as- 
signment a that satisfies ^ (i.e., cost(a,<P) —m if $ consists of m clauses). 

We consider several subproblems of Max- Sat here. For every integer k > 
2, we define the MAX-fcSAT problem as a subproblem of Max-Sat, where 
the problem instances are formulae in fcCNF 33 . For every integer k > 2, we 
define the Max-E&Sat as a subproblem of Max-/cSat, where the inputs are 
formulae consisting of clauses of the size k only. Each clause l\ V 2 V • • • V Ik 
of such a formula is a Boolean function over exactly k variables, i.e., li 7^ lj 
and k 7^ lj for all i, j E {1, . . . , &}, i 7^ j. 

LINEAR PROGRAMMING. 

First, we define the general version of the linear programming problem and 
then we consider some special versions of it. 

Linear Programming (LP) 

Input: A matrix A = a vector b E IR m , and a vector 

c E IR n , ri, ra E IN - {0}. 

Constraints: ftA(A,b,c) = {X E IR n | A -X — b and the elements of X 
are non- negative reals only}. 

Costs: For every X = (xi, . . . , x n ) E M(A , 6 , c), c— (ci, . . . , c n ) T , 

cost(X , (A, 6, c)) = c T • X = Yh=i °i X i- 
Goal: minimum. 

We observe that the set of constraints A X = b and Xj > 0 for i = 1 , . . . , n 
is a system of ra+n linear equations of n unknowns. Since every linear equation 
determines an (n — l)-dimensional affine subspace of the vector space IR n , 
M(A , 6, c) can be considered as the intersection of the m + n affine subspaces 
determined by the linear equations 34 of A • X = b and X E (lR- 0 ) n . 

33 That is, the size of each clause of # is at most k. 

34 Note that there exist several other forms of the linear programming problem. For 
instance, one can exchange the constraint A • X = b with A • X > b. Or one can 
take maximization instead of minimization and consider the constraint A - X < b 
instead of A • X — b. 




112 2 Elementary Fundamentals 



In combinatorial optimization we often consider the problem of integer pro- 
gramming. It can be defined by exchanging reals with integers in the problem 
instances as well as in the feasible solutions. 

Integer Linear Programming (IP) 

Input: An m x n matrix A = and two vectors b = 

(&i, • • • ,&m) T , C = (ci, . . . ,c n ) T for some n,ra G IN — {0}, 
are integers for z = 1 , . . . , m, j — 1 , . . . , n. 

Constraints: M(A , 6, c) = {X = (#i, . . . , x n ) G 2Z n \ AX = 6 and Xi > 0 
for i = 1, . . . , n}. 

Costs: For every X — (#i, . . . ,x n ) G M(A,b,c), 

cost(x , (a, 6, c)) = 

Goal: minimum. 

Note that IP is not a subproblem of LP, because we did not restrict the 
language of inputs only, but also the constraints. 

The 0/1-Linear Programming (0/1-LP) is the optimization problem 
with the language of input instances of IP and the additional constraints 
requiring that X G {0, l} n (i.e., that A4(A , b , c) C {0, l} n ). 

The last problems we consider are maximization problems on systems of 
linear equations. The objective is to find values for unknowns that satisfy the 
maximal possible number of linear equations of the given system. Let k > 2 
be prime. 

Maximum Linear Equation Problem Mod k (Max-LinModAi) 

Input: A set S of m linear equations over n unknowns, n,m G IN — {0}, 

with coefficients from 

(An alternative description of an input is an m x n matrix over 
and a vector b G ZX™). 

Constraints: M(S) = 2Z™ 

{a feasible solution is any assignment of values from {0, 1, . . . , k — 1} 
to the n unknowns (variables)}. 

Costs: For every X G A4(S), 

cost(X , S ) is the number of linear equations of S satisfied by X. 
Goal: maximum. 

Consider the following example of an input instance over TZ^. 

X\ X 2 =1 

Xl + X 3 = 0 

X 2 X% — 0 
+ X2 + Xs = 1. 

Observe that this system of linear equations does not have any solution 
in ^ 2 - The assignment X\ — X 2 — %3 = 0 satisfies the second equation 
and the third equation. The assignment x\ — X 2 — %3 = 1 satisfies the last 




2.3 Fundamentals of Algorithmics 113 



three equations and the assignment = ^2 = 1 satisfies the first 

two equations and the last equation. The last two assignments are optimal 
solutions. 

For every prime and every positive integer ra, we define the problem 
MAX-EmLiNMoDfc as the subproblem of Max-LinMod/c, where the input 
instances are sets of linear equations such that every linear equation has at 
most m nonzero coefficients (contains at most m unknowns). 

For instance, the first three linear equations of the system considered above 
form a problem instance of Max-E2LinMod2. 



Keywords introduced in Section 2.3.2 

decision problem, primality testing, equivalence problem for polynomials, equiva- 
lence problem for one-time-only branching programs, satisfiability problems, clique 
problem, vertex cover problem, Hamiltonian problem, problems of existence of a 
solution of systems of linear equations, optimization problem, problem instance, 
feasible solution, optimal solution, maximization problem, minimization problem, 
traveling salesperson problem (TSP), metric TSP (A-TSP), geometrical TSP, 
makespan problem (MS), minimum vertex cover problem (Min-VCP), set cover 
problem (SCP), maximum clique problem (Max-CL), minimum cut problem 
(Min-Cut), maximum cut problem (Max-Cut), knapsack problem (KP), sim- 
ple knapsack problem (SKP), bin-packing problem (Bin-P), maximum satisfia- 
bility problem (Max-Sat), linear programming (LP), integer programming (IP), 
0/1-linear programming (0/1-LP), maximum linear equation problem modulo k 
(Max-LinMod/c) 

Summary of Section 2.3.2 

A decision problem is a problem of deciding whether a given input has a required 
property. Since the set of all inputs with the required property can be viewed as a 
language L C the decision problem for L is to decide whether a given input 
xGP belongs to L or x ^ L. 

An optimization problem is specified by 

• the set of problem instances, 

• the constraints determining the set of feasible solutions to every problem in- 
stance (input), 

• the cost function that assigns a cost to every feasible solution, 

• the goal that may be maximization or minimization. 

The objective is to find an optimal solution (one of the best feasible solutions 
according to the cost and the goal) for every input instance. 

Currently, there are thousands of hard algorithmic problems considered in the 
literature on algorithmics and in numerous practical applications. To learn the funda- 
mentals of algorithmics it is sufficient to consider some of them - so-called paradig- 
matic problems. Paradigmatic problems are some kind of pattern problems in the 




114 2 Elementary Fundamentals 



sense that solving most of the hard problems can be reduced to solving some of 
the paradigmatic problems. Some of the most fundamental decision problems are 
satisfiability, primality testing, equivalence problem for polynomials, clique problem, 
and Hamiltonian problems. Some representatives of the paradigmatic optimization 
problems are maximum satisfiability, traveling salesperson problem, scheduling prob- 
lems, set cover problem, maximum clique problem, knapsack problem, bin-packing 
problem, linear programming, integer programming, and maximum linear equation 
problems. 



2.3.3 Complexity Theory 

The aim of this section is to discuss the ways of measuring computational 
complexity of algorithms (computer programs) and to present the main frame- 
work for classifying algorithmic problems according to their computational 
hardness. The first part of this section is useful for all subsequent chapters. 
The second part, devoted to the complexity classes and to the concept of 
NP-completeness, provides (besides the general philosophy of algorithms and 
complexity) fundamentals for Section 4.4.3 on lower bounds for inapproxima- 
bility (i.e., for the classification of optimization problems according to their 
polynomial-time approximability) . 

This book is devoted to the design of programs on the algorithmic level 
and so we will not deal with details of algorithm implementations in specific 
programming languages. To describe algorithms we use either an informal 
description of its parts such as “choose an edge from the graph G and verify 
whether the rest of the graph is connected” or a Pascal-like language with 
instructions such as for, repeat, while, if . . . then . . . else, etc. Observe that 
this rough description can be sufficient for the analysis of complexity when 
the complexity of the implementation of the roughly described parts is well 
known. 

The aim of the complexity analysis is to provide a robust analysis in the 
sense that the result does not depend on the structural and technological 
characteristics of concrete sequential computers and their system software. 
Here, we focus on the time complexity of computations and only sometimes 
consider the space complexity. We distinguish two basic ways of complexity 
measurement, namely the uniform cost measurement and the logarithmic 
cost measurement. 

The approach based on uniform cost measure is the simplest one. The 
measurement of time complexity consists of determining the overall number 
of elementary 35 instructions executed in the considered computation, and the 
measurement of space complexity consists of determining the number of vari- 
ables used in the computation. The advantage of this measurement is that it 
is simple. The drawback is that it is not always adequate because it considers 

35 Elementary instructions are arithmetic instructions over integers, comparison of 
two integers, reading, writing, loading integers and symbols, etc. 




2.3 Fundamentals of Algorithmics 115 



cost 1 for an arithmetic operation over two integers independently of their 
size. When the operands are integers whose binary representations consist of 
several hundreds of bits, none of them can be stored in one computer word 
(16 or 32 bits). Then the operands must be stored in several computer words 
(i.e., one needs several space units (variables) to save them) and the execution 
of the arithmetic operation over these two large integers corresponds to the 
execution of a special program performing an operation over large integers by 
several operations over integers of the computer word size. Thus, the uniform 
cost measurement may be applied in the cases where one can assume that dur- 
ing the whole computation all variables contain values whose size is bounded 
by a fixed constant (hypothetical computer word length). This is the case for 
computing over zZp for a fixed prime p or working with logical values only (in 
Boolean algebra). 

A serious anomaly of the use of the uniform cost measurement appears 
in the following example. Let k and a > 2 be two positive integers of sizes 
not exceeding the size of a hypothetical computer word. Consider the task 
of computing the number a 2 . This can be done with the following strategy. 
Compute 



a 2 = a - a, a 4 



a • a 



a s = a 4 • a 4 , 



a = a 



The uniform cost space complexity is 3 because one additional variable is 
sufficient to execute the following computation 



for i — 1 to k do a := a * a. 



The uniform cost time complexity is in 0(k) because exactly k multiplications 
are executed. This contrasts with the fact that one needs at least 2 k bits 
to represent the result a 2 and to write 2 k bits any machine needs fi(2 k ) 
operations on its machine words. Since this consideration works for every 
positive integer k , we have an exponential gap between the uniform cost time 
complexity and any realistic time complexity, and an unbounded difference 
between the uniform cost space complexity and any realistic space complexity. 

The solution to such a situation, where the values of variables grow un- 
boundedly, is to use the logarithmic cost measurement. With respect to this 
measurement the cost of every elementary operation is the sum of the sizes 
of the binary representations of the operands. 36 Obviously, this approach to 
complexity measurement avoids anomalies of the kind mentioned above, and it 
is generally adopted in the complexity analysis of algorithms. Sometimes one 
distinguishes the time complexity of distinct operations. Since the best algo- 
rithm for multiplying two n-bit integers needs Q(n • logn) binary operations, 
the complexity of multiplication and division is considered to be 0(n • logn) 
while addition, subtraction, and assignment have costs linear in the binary 
size of the arguments. 

36 If one wants to be very careful then the binary length of the addresses of the 
variables in memory (that correspond to the operands) can be added. 




116 2 Elementary Fundamentals 



Definition 2.3.3. 1. Let Ej and Eo be alphabets. Let A be an algorithm that 
realizes a mapping from Ej to zb- For every x E Ej, Time A (x) denotes 
the time complexity 37 (according to the logarithmic cost) of the computation 
of A on the input x, and Space A (x) denotes the space complexity (according 
to the logarithmic cost measurement) of the computation of A on x. 

One never considers the time complexity Time a (the space complexity 
Space A ) as a function from Ej to IN. This is because we usually have expo- 
nentially many inputs for every input length and to estimate Time^(x) for 
every x E Ej would often be an unrealistic job. Even if this would succeed, 
the description of Time a could be so complex that one would have trouble 
determining some fundamental characteristics of Time a- The comparison of 
the complexities of two algorithms for the same algorithmic problem could be 
a difficult job, too. Thus, the complexity is always considered as a function of 
the input size and one observes the asymptotic growth of this function. 

Definition 2. 3. 3. 2. Let Ej and Eo be two alphabets. Let A be an algorithm 
that computes a mapping from E } to Zb- The (worst case) time complex- 
ity of A is a function Time a : (IN — {0}) — > IN defined by 

Time A (n) = max{ Time A {x) | x E Ef} 

for every positive integer n. The (worst case) space complexity of A is a 
function Space A • (IN — {0}) — ► IN defined by 

Space A ( n ) = ma x{Space A (x) | x E Ef }. 

Time a is defined in such a way that we know that every input of size n 
(i.e., every input from Ef) is solved by A in time at most Tzme^(n) and that 
there is an input x of size n with TimeA(x) = TimeA(n). This is the reason 
why one calls this kind of complexity analysis the worst case analysis. The 
drawback of the worst case analysis may occur when one uses it for an algo- 
rithm with very different complexities on inputs of the same length. 38 In that 
case one can consider the average case analysis that consists of determining 
the average complexity on all instances of size n. There are two problems 
with this approach. First, to determine the average complexity is usually a 
much harder problem than to determine Tzme^(^), and in many cases we 
are not able to perform the average complexity analysis. Secondly, the aver- 
age complexity provides useful information only if the average is taken over 
a realistic probability distribution over the inputs of any fixed length. Such 
input distributions may essentially differ from application to application, i.e., 
an average cost analysis according to a specific input distribution does not 

37 Note that one assumes that an algorithm terminates for every input x, and so 
TimeA(x) is always a non-negative integer. 

38 A famous example is the simplex algorithm for the problem of linear program- 
ming. 




2.3 Fundamentals of Algorithmics 117 



provide any robust answer according to all possible applications of this algo- 
rithm. Even in the case of one specific application, we are often not able to 
estimate these input distributions for every input size. In this book it will be 
sufficient to consider the worst case complexity only, because it will provide a 
good characterization of the behavior of almost all considered algorithms. 

An important observation is that we have fixed the input length in Defi- 
nition 2. 3. 3. 2 in the logarithmic cost manner. There, the length of an input 
is considered to be the length of its code over Uj. Remember that our inter- 
pretation of an input alphabet Z7 is that it is the set of all computer words 
allowed. Note that sometimes it is sufficient to consider the unit cost of the 
input size. This means that the size is the number of items (for instance, inte- 
gers) of the input. This is again acceptable if all the items of the input are of 
equal size. For instance, for a n x n matrix over 7Z V for a fixed prime p, one 
can consider the size n 2 or even n, and measure the complexity according to 
this parameter. But usually one may not use this approach for problems from 
algorithmic number theory, where the input consists of one or a few numbers. 
In such cases we strictly consider the input size as the length of the binary 
representation of the input. 

In this book we shall very rarely perform a very precise analysis of the 
designed algorithms, because of the following two reasons. First, to make a 
precise analysis one has to deal with the implementation details that are usu- 
ally omitted here. Secondly, we consider hard problems only and we will be 
satisfied with establishing reasonable asymptotic upper bounds on the com- 
plexity of designed algorithms. 

The main goal of complexity theory is to classify algorithmic problems 
according to their computational difficulty. Since the time complexity as the 
number of executed operations seems to be the central measure of algorithm 
complexity, one prefers to measure the computational difficulty of problems 
in terms of time complexity. Intuitively, the time complexity of a problem 
U could be a function Tjj : IN — > IN such that &{Tjj{n)) operations are 
necessary and sufficient to solve U. But this is still not a consistent definition 
of the complexity of U because we need to have an algorithmic solution, i.e., 
an algorithm solving U in the time complexity 0(Tu(n)). Thus, a natural 
way to define the time complexity of U seems to be to say that the time 
complexity of U is the time complexity for the “best” (optimal) algorithm 
for U. Unfortunately, the following fundamental result of complexity theory 
shows that this approach to define Tu is not consistent. 

Theorem 2. 3. 3. 3. There is a decision problem (L, Bbool) such that, for every 
algorithm A deciding L, there exists another algorithm B deciding L, such that 

TimeB(n) — lo g 2 (Tzme^(n)) 
for infinitely many positive integers n. 

Obviously, Theorem 2. 3. 3. 3 implies that there is no best (optimal) algo- 
rithm for L and so it is impossible to define complexity of L as a function 




118 2 Elementary Fundamentals 



from IN to IN in the way proposed above. This is the reason why one does not 
try to define the complexity of algorithmic problems but rather the lower and 
upper bounds on the problem complexity. 

Definition 2. 3. 3. 4. Let U be an algorithmic problem, and let f, g be func- 
tions from IN to IR + . We say that 0(g(n)) is an upper bound on the 
time complexity of U if there exists an algorithm A solving U with 
TimeA{n) G 0(g(ri)). 

We say that i?(/(n)) is a lower bound on the time complexity of U 

if every algorithm B solving U has Timesin ) G i?(/(n)). 

An algorithm C is optimal for the problem U if Timec{n ) G 0(g(n)) and 
f2(g(n)) is a lower bound on the time complexity ofU. 

To establish an upper bound on the complexity of a problem U it is suffi- 
cient to find an algorithm solving U. Establishing a nontrivial lower bound on 
the complexity of U is a very hard task because it requires proving that every 
of the infinitely many known and unknown algorithms solving U must have 
its time complexity in i?(/(n)) for some /. This is in fact a nonexistence proof 
because one has to prove the nonexistence of any algorithm solving U with 
the time complexity asymptotically smaller than f(n). The best illustration 
of the hardness of proving lower bounds on problem complexity is the fact 
that we know thousands of algorithmic problems for which 

(i) the time complexity of the best known algorithm is exponential in the 
input size, and 

(ii) no superlinear lower bound such as i?(nlogn) is known for any of them. 

Thus, we conjecture, for many of these problems, that there does not exist 
any algorithm solving them in time polynomial in the input size, but we are 
unable to prove that one really needs more than 0(n) time to solve them. 

To overcome our disability to prove lower bounds on problem complexity 
(i.e., to prove that some problems are hard), some concepts providing reason- 
able arguments for hardness of concrete problems instead of the evidence of 
their hardness were developed. These concepts are connected with some for- 
mal manipulation of algorithms and complexity in terms of Turing machines 
(TMs) and Turing machine complexity. We assume that the reader is famil- 
iar with the Turing machine model. Remember, that following the Church- 
Turing thesis, a Turing machine is a formalization of the intuitive notion 
of algorithm. This means that a problem U can be solved by an algorithm 
(computer program in any programming language formalism) if and only if 
there exists a Turing machine solving U . Using the formalism of TMs it was 
proved that for every increasing function / : IN — > IR + 

(i) there exists a decision problem such that every TM solving it has the time 
complexity in i?(/(n)), 

(ii) but there is a TM solving it in 0(f(n) • log/(n)) time. 




2.3 Fundamentals of Algorithmics 119 



This means that there is an infinite hierarchy of the hardness of decision 
problems. In what follows we shall use the terms algorithm and computer 
program instead of the term Turing machine whenever possible, and so we 
omit unnecessary technicalities. 

One can say that the main objective of the complexity theory is 
to find a formal specification of the class of practically solvable problems 
and 

to develop methods enabling the classification of algorithmic problems 
according to their membership in this class. 

The first efforts in searching for a reasonable formalization of the intuitive 
notion of practically solvable problems result in the following definition. Let, 
for every TM (algorithm) M, L(M) denote the language decided by M. 

Definition 2. 3. 3. 5. We define the complexity class P of languages decidable 
in polynomial-time by 

P — {L = L(M) \M is a TM (an algorithm) with TimeM (n) € 0(n c ) 
for some positive integer c}. 

A language (decision problem) L is called tractable (practically solvable) 
if L e P. A language L is called intractable if L ^ P. 

Definition 2. 3. 3. 5 introduces the class P of decision problems decidable by 
polynomial-time computations and says that exactly the set P is the specifica- 
tion of the class of tractable (practically solvable) problems. Let us discuss the 
advantages and the disadvantages of this formal definition of tractability. The 
two main reasons to connect polynomial-time computations with the intuitive 
notion of practical solvability are the following: 

(1) The definition of the class P is robust in the sense that P is invariant for all 
reasonable models of computation. The class P remains the same indepen- 
dent of whether it is defined in terms of polynomial-time Turing machines, 
in terms of polynomial-time computer programs over any programming 
language, or in terms of polynomial-time algorithms of any reasonable 
formalization of computation. This is the consequence of another funda- 
mental result of complexity theory saying that all computation models 
(formalizations of the intuitive notion of algorithm) that are realistic in 
the complexity measurement are polynomially equivalent. Poly normally 
equivalent means that if there is a polynomial-time algorithm for an al- 
gorithmic problem U in one formalism, then there is a polynomial-time 
algorithm for U in the other formalism, and vice versa. Turing machines 
and all programming languages used are in this class of polynomially equiv- 
alent computing models. Thus, if one designs a polynomial-time algorithm 
for U in C++, then there is a polynomial-time algorithm for U in any 




120 2 Elementary Fundamentals 

reasonable computing formalism. On the other hand, if one proves that 
there is no polynomial-time TM deciding a language L, then one can be 
sure that there is no polynomial-time computer program deciding L. Note 
that this kind of robustness is very important and it must be required for 
any reasonable specification of the class of tractable problems. 

(2) While the first reason for choosing P is a theoretical one, the second reason 
is more connected with intuition about practical solvability and experience 
in algorithm design. Consider the table in Figure 2.14 that illustrates the 
growth of complexity functions lOn, 2 n 2 , n 3 , 2 n , and n\ for input sizes 10, 
50, 100, and 300. 



n 

/(«) 


10 


50 


100 


300 


lOn 


100 


500 


1000 


3000 


2 n 2 


200 


5000 


20000 


180000 


n 3 


1000 


125000 


1000000 


27000000 


2 n 

n\ 


1024 

« 3.6 • 10 6 


(16 digits) 
(65 digits) 


(31 digits) 
(161 digits) 


(91 digits) 
(623 digits) 



Fig. 2.14. 



Observe that if the values of f(n) are too large we write only the number 
of digits of the decimal representation of /(n). Assuming that one has a 
computer that executes 1000000 = 10 6 operations per second, an algorithm 
A with Tzme^(n) = n 3 runs in 27 seconds for n = 300. But if TimeA(n) = 
2 n , then the execution of A for n — 50 would take more than 30 years, and 
for n — 100 more than 3 • 10 16 years. If one compares the values of 2 n and 
n! for a realistic input size between 100 and 300 with the suggested number 
of seconds since the “Big Bang” that has 21 digits 39 , then everybody sees 
that the execution of algorithms of exponential complexity on realistic 
inputs is beyond the borders of physical reality. Moreover, observe the 
following properties of the functions n 3 and 2 n . If M is the time you can 
wait for the results, then developing a computer that executes twice as 
many instructions in a time unit as the previous computer, helps you 

(i) to increase the size of tractable input instances from M 1//3 to v^-M 1 / 3 
for an n 3 -algorithm (i.e., one can compute on y/2 times larger sizes of 
input instances than before), but 

(ii) to increase the size of tractable input instances by 1 bit for a 2 n - 
algorithm. 

Thus, algorithms of exponential complexity cannot be considered practical, 
and the algorithms of the polynomial-time complexity 0(n c ) for small c’s 
can be considered practical. Of course, a running time n 1000 is unlikely 

39 Note that the number of protons in the known universe has 79 digits. 




2.3 Fundamentals of Algorithmics 121 



to be of any practical use because n 1000 > 2 n for all reasonable sizes n of 
inputs. Nevertheless, experience has proved the reasonability of considering 
polynomial-time computations to be tractable. In almost all cases, once 
a polynomial-time algorithm has been found for an algorithmic problem 
that formerly appeared to be hard, some key insight into the problem 
has been gained and new polynomial-time algorithms of a low 40 degree of 
the polynomial have been designed for the problem. There are only a few 
known exceptions of nontrivial problems where the best polynomial-time 
algorithm is not of practical utility. 

Above we argued that P is a good specification of the class of practically 
solvable problems. Nevertheless, this whole book is devoted to solving prob- 
lems that are probably not in P. But this does not destroy the idea of taking 
polynomial-time as the threshold of practical solvability. Our approaches to 
solving problems outside P usually change our requirements such as 

• using randomized algorithms (providing the right solution with some prob- 
ability) instead of deterministic ones (providing the right solution with 
certainty) or 

• searching for an approximation of an optimal solution instead of searching 
for an optimal solution. 

Thus, the current view on the specification of the class of tractable prob- 
lems is more or less connected with randomized polynomial-time (approxima- 
tion 41 ) algorithms. 

Having the class P, one would like to have methods of classifying problems 
according to their membership in P. To prove the membership of a decision 
problem L to P, it is sufficient to design a polynomial-time algorithm for L. As 
already mentioned above, we do not have any method that would be able to 
prove for most of the practical problems of interest that they are not in P, i.e., 
that they are intractable (hard). To overcome this unpleasant situation the 
concept of NP-completeness was introduced. This concept provides at least a 
good reason to believe that a specific problem is hard, when one is unable to 
prove the evidence of this fact. 

To introduce the concept of NP-completeness we have to consider non- 
deterministic computation. Nondeterminism is nothing natural from the 
computational point of view because we do not know any way it can be ef- 
ficiently implemented on real computers. 42 For the reader not familiar with 
nondeterministic Turing machines, one can introduce nondeterminism to every 
programming language by adding an operation choice(a , b) with the meaning 
goto a or goto b. Thus, the computation may branch into two computations. 

40 At most 6, but often 3 

41 In the case of optimization problems 

42 In fact we do not believe that there exists an efficient simulation of nondetermin- 
istic computations by deterministic ones. 




122 2 Elementary Fundamentals 



This means that a nondeterministic TM (algorithm) may have a lot of com- 
putations on an input while any deterministic TM (algorithm) has exactly 
one computation for every input. One usually represents all computations of 
a nondeterministic algorithm A on an input x by the so-called computation 
tree of A on x. Such a computation tree is depicted in Figure 2.15 for a 
nondeterministic algorithm A that accepts Sat (solves the decision problem 
(SAT,r Zo ^ c )). 




The tree Ta(x) in Figure 2.15 contains all computations of A on the input 
x corresponding to the formula 

$ x = (xi V x 2 ) A (xi V x 2 V x 3 ) A (#i V x 3 ) A x 2 

over three variables Xi,#2, and x 3 . A proceeds as follows. First A determin- 
istically (i.e., without computation branching) verifies whether x is a code of 
a formula <P X over Ei og i C in the computation part C ' (Figure 2.15). If not, A 
rejects the input. Assume that x codes the above formula <f> x . The general 
strategy of A is to nondeterministically guess an assignment that satisfies $ x . 
This is realized in as many branching steps as there are variables in the for- 
mula. In our example (Figure 2.15) A first branches into two computations. 




2.3 Fundamentals of Algorithmics 123 



The left one corresponds to the guess that x\ — 0 and the right computation 
corresponds to the guess that X\ — 1. Each of these two computations imme- 
diately branches into two computations according to the choice of the Boolean 
value for X 2 . Thus, we obtain 4 computations and each of them immediately 
branches according to the choice of X 3 . Finally, we have 8 computations, each 
of them corresponding to the choice of one of the 8 assignments for {xi, £ 2 , # 3 }- 
For every assignment a, the corresponding computation deterministically 
verifies whether a satisfies <F X . If a satisfies <T X , A accepts x ; otherwise, A 
rejects x. We observe in Figure 2.15 that A accepts x in the computation C 101 
because 101 is the only assignment satisfying <P X . All other computations 
a ^ 101, reject x. 

The acceptance and the complexity of nondeterministic algorithms are 
defined as follows. 

Definition 2. 3. 3. 6. Let M be a nondeterministic TM (algorithm). We say 
that M accepts a language L, L = L(M), if 

(i) for every x G L, there exists at least one computation of M that accepts 

x, and 

(ii) for every y L, all computations of M reject y. 

For every input w G L, the time complexity TimeM (w) of M on w is 

the time complexity of the shortest accepting computation of M on w. The 
time complexity of M is the function TimeM from IN to IN defined by 

TimeM(n) = max{ TimeM(x) \ x G L(M ) fl E n }. 

We define the class 

NP = {L(M) | M is a polynomial-time nondeterministic TM} 

as the class of decision problems decided nondeterministically in polynomial 
time. 

We observe that, for a nondeterministic algorithm, it is sufficient that one 
of the choices is the right way providing the solution (correct answer). For a de- 
cision problem (L, E), any (deterministic) algorithm B has to decide whether 
x G L or x 0 L for every input x. The accepting (rejecting) computation of B 
on x can be viewed as the proof of the fact x G L (x 0 L). So, the complexity 
of (deterministic) algorithms for decision problems may be considered as the 
complexity of producing a proof of the correctness of the output. Following the 
example of the nondeterministic algorithm for Sat (Figure 2.15), the essential 
part of the computation from the complexity point of view is the verification 
whether a guessed assignment satisfies the given formula or not. Thus, the 
complexity of the nondeterministic algorithm A is in fact the complexity of 
the verification whether “a given assignment a proves the satisfiability of <f> x \ 
This leads to the following hypothesis: 




124 2 Elementary Fundamentals 



The complexity of deterministic computations is the complexity of 
proving the correctness of the produced output , while the complexity 
of nondeterministic computation is equivalent to the complexity of de- 
terministic verification of a given proof (certificate) of the fact x E L. 

In what follows we show that this hypothesis is true for polynomial-time 
computations. The fact that deterministic computations can be viewed as 
proofs of the correctness of the produced outputs is obvious. To prove that 
the nondeterministic complexity is the complexity of the verification we need 
the following formal concept. 

Definition 2.3.3. 7. Let L C U* be a language. An algorithm A working on 
inputs from U* x {0, 1}* is called a verifier for L, denoted L = V(A), if 

L = {w E U* | A accepts (w, c ) for some c E {0, 1}*}. 

If A accepts (w,c) E ZJ* x {0, 1}*, we say that c is a proof (certificate 43 ) 
of the fact w E L. 

A verifier A for L is called a polynomial-time verifier if there exists 
a positive integer d such that , for every w E L, TimeA(w,c) E 0(\w\ d ) for a 
proof c ofw^L. 

We define the class of polynomially verifiable languages as 

vp = m^ is a polynomial-time verifier}. 

We illustrate Definition 2. 3. 3. 7 with the following example. A verifier for 
Sat is an algorithm that, for each input from (x,c) G S* oglc xS* bool , interprets 
x as a representation of a formula <L> X and c as an assignment of Boolean values 
to the variables of @ x . If this interpretation is possible (i.e., x is a correct code 
of a formula <I X and the length of c is equal to the number of variables in 
then the verifier checks whether c satisfies <L> X . Obviously, the verifier 
accepts (#, c) if and only if c is an assignment satisfying <L> X . We observe that 
the verifier is a polynomial-time algorithm because a certificate c for x is 
always shorter than x and one can efficiently evaluate a formula for a given 
assignment to its variables. 

Exercise 2. 3. 3. 8. Describe a polynomial-time verifier for 

(i) HC, 

(ii) VC, and 

(iii) Clique. □ 

The following theorem proves our hypothesis in the framework of polyno- 
mial-time. 

43 Note that a certificate of “ w E L” needs not necessarily be a mathematical proof 
of the fact “w E L”. More or less, a certificate should be considered as additional 
information that essentially simplifies proving the fact u w E L”. 




2.3 Fundamentals of Algorithmics 125 



Theorem 2. 3. 3. 9. 

NP = VP. 

Proof. We prove NP = VP by proving NP C VP and VP C NP. 

(i) We prove NP C VP. Let L G NP, L C L*. Then there exists a 
polynomial-time nondeterministic algorithm (TM) M such that L — 
L(M). One can construct a polynomial-time verifier A that works as fol- 
lows: 

A: Input: (x,c) € S* x S* bool . 

(1) A interprets c as a navigator for the simulation of the nondeterministic 
choices of M. A simulates the work of M (step by step) on w. If M 
has a choice of two possibilities, then A takes the first one if the next 
bit of c is 0, and A takes the second possibility if the next bit of c is 1. 
In this way A simulates exactly one of the computations of M on x. 

(2) If M still has a choice and A used already all bits of c, then A halts 
and rejects. 

(3) If A succeeds to simulate a complete computation of M on x , then A 
accepts (x, c) iff M accepts x in this computation. 

Obviously, V(A) = L(M ) because if M accepts x, then there exists a 
certificate c that corresponds to a sequence of nondeterministic choices 
unambiguously determining an accepting computation of A on x. Since A 
does nothing else than simulating M step by step and, for every x G L(M), 
A simulates the shortest accepting computation of M on x, too, V is a 
polynomial-time verifier for L. 

(ii) We prove VP C NP. Let LCP, for an alphabet 17, be a language from 
VP. Thus, there exists a polynomial-time verifier A such that V(A) — L. 
One can design a polynomial-time nondeterministic algorithm M that 
simulates A as follows. 

M: Input: an x G E* . 

(1) M nondeterministically generates a word c G {0, 1}*. 

(2) M simulates step by step the work of A on (x, c). 

(3) M accepts x, if A accepts (x, c), and M rejects x, if A rejects (x,c). 

Obviously, L(M) — V(A) and M runs in polynomial-time. □ 

Now we have the following situation. We have defined two language classes 
P and NP. Almost all interesting decision problems appearing in practice are 
in NP. So, NP is an interesting class from the practical point of view, too. 
Almost everybody conjectures, if not directly believes, that P C NP. The two 
main reasons for this conjecture follow. 

(i) A theoretical reason 

People do not believe that finding a proof is as easy as verifying the 
correctness of a given proof. This mathematical intuition supports the 
hypothesis P C NP = VP. 




126 2 Elementary Fundamentals 



(ii) A practical reason (experience) 

We know more than 3000 problems in NP, many of them have been inves- 
tigated for 40 years, for which no deterministic polynomial-time algorithm 
is known. It is not very probable that this is only the consequence of our 
disability to find efficient algorithms for them. Even if this would be the 
case, for the current practice the classes P and NP are different because 
we do not have polynomial-time algorithms for numerous problems from 
NP. 

This provides a new idea for how to “prove” the hardness of some problems, 
even if we do not have direct mathematical methods for this purpose. Let us 
try to prove L £ P for a L G NP by the additional assumption P C NP. The 
idea is to say that a decision problem L is one of the hardest problems in NP 
if L G P would immediately imply P = NP. Since we do not believe P = NP, 
this is a reasonable argument to believe that L £ P, i.e., that L is hard. Since 
we want to avoid hard proofs of the nonexistence of efficient algorithms, we 
define the hardest problems of NP as such problems, so that any hypothetical 
efficient algorithm for them can be transformed into an efficient algorithm 
for any other decision problem in NP. The following definition formalizes this 
idea. 

Definition 2.3.3.10. Let L\ C and L2 C be two languages. We say 
that Li is polynomial-time reducible 44 to L2, < p L2, if there exists 
a polynomial-time algorithm A that computes a mapping from to such 
that, for every x G 



x G L\ <==> A(x) G Z/2- 

A is called the polynomial-time reduction from L\ to L2. 

A language L is called NP-hard if, for every U G NP, U < p L. 

A language L is called NP-complete if 

(i) L G NP, and 

(ii) L is NP -hard. 

First, observe that L\ < p L2 means that L2 is at least as hard as L\ 
because if a polynomial-time algorithm M decides L2 then the “concatenation” 
of A reducing L\ to L2 and M provides a polynomial-time algorithm for L\. 
The following claim shows that NP-hardness is exactly the term that we have 
searched for. 

Lemma 2.3.3.11. If L is NP -hard and L G P, then P = NP. 

Proof. Let L C £* be an NP-hard language and let L G P. Then, there is 
a polynomial-time algorithm M with L = L(M). We prove that for every 

44 Note that polynomial-time reducibility of Definition 2.3.3.10 is also called Karp- 
reducibility or polynomial-time many-to-one reducibility in the literature. 




2.3 Fundamentals of Algorithmics 127 



U G NP, U CTJ, there is a polynomial-time algorithm Ajj with L(Ajj) = U , 
i.e. , that U G P. Since U < p L, there exists a polynomial-time algorithm B 
such that x G U iff B(x) G L. An algorithm Ajj with L(Au) = U can work as 
follows. 

Au : Input: an x G 

Step 1: Au simulates the work of B on x and computes B(x). 

Step 2: Au simulates the work of M on B(x) G E*. Au accepts x iff M 

accepts B(x). 

Since x G U iff B(x) G L, L{Au) = U. Since TimeA v (x) = Times(x) + 
TimeM{B(x))i B and M work in polynomial time, and \B(x)\ is polynomial 
in |x|, we see that Au is a polynomial-time algorithm. □ 

The remaining task is to prove, for a specific language L, that all languages 
from NP are reducible to L. This so-called master reduction was proved for 
Sat. We do not present the proof here because we want to omit technical 
considerations based on the Turing machine formalism. 

Theorem 2.3.3.12 (Cook’s Theorem). Sat is NP- complete. 45 

From the practical point of view one is interested in a simple method 
for establishing that a problem U of interest is NP-hard. One does not need 
any variation of the master reduction to do it. As claimed in the following 
observation it is sufficient to take a known NP-hard problem L and to find a 
polynomial-time reduction from U to L. 

Observation 2.3.3.13. Let L\ and L 2 be two languages. If L\ < p L 2 and L\ 
is NP-hard, then L 2 is NP-hard. 

Exercise 2.3.3.14. Prove Observation 2.3.3.13. 

(Hint: The proof of Observation 2.3.3.13 is very similar to the proof of Lemma 
2.3.3.11.) □ 

Historically, by using the claim of Observation 2.3.3.13, Sat has been 
used to prove NP-completeness for more than 3000 decision problems. An 
interesting point is that an NP-complete problem can be viewed as a prob- 
lem that somehow codes any other problem from NP. For instance, the NP- 
completeness of Sat means that every decision problem (L, E) from NP can 
be expressed in the language of Boolean formulae. This is right because, for 
each input x G E*, we can efficiently construct a Boolean formula $ x that 
is satisfiable iff x G L. Similarly, proving NP-hardness of a problem from 
graph theory shows that every problem from NP can be expressed in a graph- 
theoretical language. In what follows we present some examples of reductions 
between different formal representations (languages). 



45 



A very detailed, transparent proof of this theorem is given in [Hro03]. 




128 2 Elementary Fundamentals 



Lemma 2.3.3.15. Sat < p Clique. 

Proof. Let $ = F\ A F 2 A • • • A F m be a formula in CNF, where Fi = 
{hi V li 2 V • • • V ZifcJ, ki E IN — {0}, for i = 1,2, ... ,m. We construct an 
input instance (G, fc) of the clique problem, such that G contains a £;-clique 
iff <f> is satisfied, as follows. 

k m; 

G=(V,E), where 



V := {[i, j] 1 1 < i < m, 1 < j < k t }, 
i.e., we take one vertex for every occurrence of a literal in 

E := {{[i, j], [r, 5]} | for all [z, j] , [r, s]eF such that i 7 ^ r and hj 7 ^ Z rs }, 

i.e., the edges connect vertices corresponding to literals from different clauses 
only, and additionally if { u , v } E E, then the literal corresponding to u is not 
the negation of the literal corresponding to v. 

Observe that the above construction of (G, k) from ^ can be computed 
efficiently in a straightforward way. 

Figure 2.16 shows the graph G corresponding to the formula 

F = {x\ V X 2 ) A {x\ V X 2 V X 3 ) A (xi V £ 3 ) A £ 2 - 

It remains to be shown that 

^ is satisfiable G contains a clique of size k — m. (2.27) 

The idea of the proof is that two literals (vertices) hj and l rs are connected 
by an edge if and only if they are from different clauses and there exists 
an assignment for which both values of hj and l rs are l’s. Thus, a clique 
corresponds to the possibility of finding an input assignment that evaluates 
all the literals corresponding to the vertices of the clique to l’s. 

We prove the equivalence (2.27) by subsequently proving both implica- 
tions. 

(i) Let $ be satisfiable. Thus, there exists an assignment ip such that <p($) = 
1. Obviously, <p(Fi) = 1 for all i E {1, . . . , m}. This implies that, for every 
i E {1, . . . , m}, there exists a di E {1, . . . , ki] such that ip(hd t ) = 1* We 
claim that the set of vertices {[i,di\ 1 1 < i < m] defines an m-clique 
in G. Obviously, [1, di], [ 2 , d 2 ], . . . , [m, d m ] are from different clauses. The 
equality hd t = ljd 3 for some i ± j would imply u(l idt ) ± u;( ljd 3 ) for every 
input assignment cu and so p(hd z ) = V>(ljd 3 ) would be impossible. Thus, 
hd x ^ ljd 3 for all ij E {1, . . . , m}, i ^ j, and {[i,d»], \j,dj]} E E for all 
hj = 1 ^ j. 




2.3 Fundamentals of Algorithmics 129 




Third clause 



Fourth clause 



Fig. 2.16. 



(ii) Let Q be a clique of G with k — m vertices. Since two vertices are con- 
nected in G only if they correspond to two literals from different clauses, 
there exists c?i, d 2 , . . • , d m , d p G {1,2,..., k p } for p = 1, . . . , m, such that 
Q = {[1, di], [2, Gfe], • . • , [m, d m ]}. Following the construction there exists 
an assignment ip such that ip(lid x ) = p(hd 2 ) = • • • = p(lmd m ) = 1- This 
directly implies ip(Fi) = p(F 2 ) = ••• = <p(F m ) = 1 and so ip satisfies 
#. □ 

Lemma 2.3.3.16. 

Clique < p VC. 

Proof. Let G = (V, E), k be an input of the clique problem. We construct an 
input (G, m) of the vertex cover problem as follows: 

m := \V\ — fc, 

G = (V, E), where E = {{v, u} \ v , u G V, u ^ v, and {u, v} £ E}. 

Obviously, this construction can be executed in linear time. 

Figure 2.17 illustrates the construction of the graph G from G. The idea 
of the construction is the following one. If Q is a clique in G, then there is 
no edge between any pair of vertices from Q in G. Thus, V — Q must be a 
vertex cover in G. So, the clique {iq, rq, of G in Figure 2.17 corresponds 
to the vertex cover {^ 2 ,^ 3 } in G. The clique {^i, ^ 5 } of G corresponds to 



130 



2 Elementary Fundamentals 



Vl 




Vi 




Fig. 2.17. 



the vertex cover {vs, ^ 4 } in G and the clique {^ 1 , ^ 2 } in G corresponds to the 
vertex cover {vz,v±,v§} in G. 

Obviously, for proving 

(G, k) e Clique (G, \V\ -k)eVC 

it is sufficient to prove 

“S C V is a clique in G V — S is a vertex cover in G”. 

We prove this equivalence by proving subsequently the corresponding im- 
plications. 

(i) Let S be a clique in G. This implies that there is no edge between the 
vertices of S in G, i.e., every edge of G is incident to at least one vertex 
in V — S. Thus, V — S is a vertex cover in G. 

(ii) Let G C V be a vertex cover in G. Following the definition of a vertex 

cover, every edge of G is incident to at least one vertex in G, i.e., there is 
no edge {it, v} E E such that both it and v belong to V — C. So, {it, v} E E 
for all it, v G V — G, it ^ v, i.e., V — C is a clique in G. □ 

Exercise 2.3.3.17. Prove VC < p Clique. □ 

Exercise 2.3.3.18. Prove 3Sat < p VC. □ 

The next reduction transforms the language of Boolean formulae into the 
language of linear equations. 

Lemma 2.3.3.19. 3Sat < p Sol-0/1-LP. 

Proof. Let $ = F\ A F 2 A • • • A F m be a formula in 3CNF over the set of 
variables X = {aq, . . . ,x n }. Let Fi = In V l& V k 3 for i — 1, . . . ,m. First, 
we construct a system of linear inequalities over X as follows. For every Fi, 
i = 1 , . . . , m, we take the linear inequality Lli 




2.3 Fundamentals of Algorithmics 131 



Zil + Zi2 + Z{3 > 1 , 

where Z{ r = Xk if hr = for some k E {1, . . . , n}, and Zi r = (1 — x g ) if 
hr = Tg for some q E {l,...,n}. Obviously, (p(L}) = 1 for an assignment 
<p : X — > {0,1} if and only if ip is a solution for the linear inequality LI{. 
Thus, every ip satisfying $ is a solution of the system of linear inequalities 
L/i, L/ 2 , . . . , L/ m , and vice versa. 

To get a system of linear equations we take 2 m new Boolean variables 
(unknowns) 2 / 1 , . . . , 2/ m , wi, • • • , w m , and transform every LI* into the linear 
equation 

Zil + %i 2 + Z {3 -y i -w i = 1. 

Clearly, the constructed system of linear equations has a solution if and only 
if the system of linear inequalities LLl, L/ 2 , • . . , LI m has a solution. □ 

Exercise 2.3.3.20. ^ Prove the following reducibilities. 

(i) Sat < p Sol-0/ 1-LP, 

(ii) 3 Sat < p SOL-IP^ for every prime Z, 

(iii) Sol-0/1-LP < p Sat, 

(iv) Sat < p Sol-IP, 

(v) Clique < p Sol-0/1-LP. □ 

Above, a successful machinery for proving the hardness of decision prob- 
lems under the assumption P C NP has been introduced. We would like to 
have a method for proving a similar kind of hardness for optimization prob- 
lems, too. To develop it, we first introduce the classes PO and NPO of opti- 
mization problems that are counterparts of the classes P and NP for decision 
problems. 

Definition 2.3.3.21. NPO is the class of optimization problems, where U = 
(Lj, Z’o, L, Lj, M, cost , goal) E NPO if the following conditions hold: 

( i ) U E P, 

(ii) there exists a polynomial pu such that 

a) for every x E L\, and every y E M.{x), \y\ <pu{ \x\), and 

b) there exists a polynomial-time algorithm that, for every y E £q and 
every x E Lj such that \y\ < pu( \x\), decides whether y E M.{x), and 

(iii) the function cost is computable in polynomial time. 

Informally, we see that an optimization problem U is in NPO if 

(i) one can efficiently verify whether a string is an instance of J7, 

(ii) the size of the solutions is polynomial in the size of the problem instances 
and one can verify in polynomial time whether a string y is a solution to 
any given input instance x, and 

(iii) the cost of any solution can be efficiently determined. 




132 2 Elementary Fundamentals 



Following the condition (ii), we see the relation to NP that can be seen as 
the class of languages accepted by polynomial-time verifiers. The conditions 

(i) and (iii) are natural because we are interested in problems whose kernel is 
the optimization and not the tractability of deciding whether a given input is a 
consistent problem instance or the tractability of the cost function evaluation. 
We observe that the Max- Sat problem is in NPO because 

(i) one can check in polynomial time whether a word x E Z \ ogic represents a 
Boolean formula L> x in CNF, 

(ii) for every x, any assignment a E {0, 1}* to the variables of <P X has the 
property \a\ < \x\ and one can verify whether |a| is equal to the number 
of variables of <L X even in linear time, and 

(iii) for any given assignment a to the variables of <L X , one can count the 
number of satisfied clauses of <P X in linear time. 

Exercise 2.3.3.22. Prove that the following optimization problems are in 
NPO: 

(i) Max-Cut, 

(ii) Max- CL, and 

(iii) Min-VCP. □ 

The following definition corresponds to the natural idea of what tractable 
optimization problems are. 

Definition 2.3.3.23. PO is the class of optimization problems U = (Zj, Zo, 
L, L/, M., cost , goal ) such that 

(i) U E NPO, and 

(ii) there is a polynomial-time algorithm that, for every x E Li, computes an 
optimal solution for x. 

In what follows we present a simple method for introducing NP-hardness of 
optimization problems in the sense that if an NP-hard optimization problem 
would be in PO, than P would be equal to NP. 

Definition 2.3.3.24. Let U = (£i, Uch L, Li, M, cost, goal) be an optimiza- 
tion problem from NPO. We define the threshold language of U as 

Lang v — {(x,a) E L/ X S* bool I Opt u (x ) < Number (a)} 

if goal = minimum, and as 

Langu = {(x,a) E Lj X Stool I Opt v (x) > Number(a)} 
if goal = maximum. 

We say that U is NP-hard if Lang n is NP -hard. 

The following lemma shows that to prove the NP-hardness of Lang v is 
really a way of showing that U is hard for polynomial-time computations. 




2.3 Fundamentals of Algor it hmics 133 



Lemma 2.3.3.25. If an optimization problem U G PO, then Lang v G P. 

Proof If U G PO, then there is a polynomial-time algorithm A that, for every 
input instance x of [/, computes an optimal solution for x and so the value 
Optjj(x). Then, A can be used to decide Lang v . □ 

Theorem 2.3.3.26. Let U be an optimization problem. If Lang v is NP -hard 
and P 7 ^ NP, then U $ PO. 

Proof Assume the opposite, i.e. , U G PO. Following Lemma 2.3.3.25 Lang v G 
P. Since Lang v is NP-hard, Lang v G P directly implies P = NP, a contradic- 
tion. LI 

To illustrate the simplicity of this method for proving the NP-hardness of 
optimization problems, we present the following examples. 

Lemma 2.3.3.27. Max-Sat is NP -hard. 

Proof. Following Definition 2.3.3.23, we have to show that Lang UAX _ SAT is 
NP-hard. Since we know that Sat is NP-hard, it is sufficient to prove Sat < p 
L ang MAX _ SAT . This reduction is straightforward. Let x code a formula <P X of m 
clauses. Thus, one takes (x,m) as the input for a polynomial-time algorithm 
for Lang MAX _ SAT . Obviously, (x,m) G Lang MAX _ SAT iff <L> X is satisfiable. □ 

Lemma 2.3.3.28. Max-CL is NP -hard. 

Proof. Observe that Clique = Lang MAX _ CL . Since we have already proved 
that Clique is NP-hard, the proof is completed. 

Exercise 2.3.3.29. Prove that the following optimization problems are NP- 
hard. 

(i) Max-3Sat, 

(ii/*) Max- 2 Sat 46 , 

(hi) Min-VCP, 

(iv) SCP, 

(v) SKP , 

(vi) Max-Cut, 

(vii) TSP, and 

(viii) Max-E3LinMod2. □ 

Exercise 2.3.3.30. Prove that P 7 ^ NP implies PO 7 ^ NPO. □ 

Keywords introduced in Section 2.3.3 

uniform cost measurement, logarithmic cost measurement, worst case complexity, 
time complexity, space complexity, lower and upper bounds on problem complexity, 
complexity classes P, NP, PO, and NPO, verifiers, polynomial-time reduction, 
NP-hardness, NP-completeness, NP-hardness of optimization problems 



46 Observe that 2 Sat G P 




134 2 Elementary Fundamentals 



Summary of Section 2.3.3 

Time complexity and space complexity are the fundamental complexity measures. 
Time complexity is measured as the number of elementary operations executed 
over computer words (operands of constant size). If one executes operations over 
unboundedly large operands, then one should consider the logarithmic cost mea- 
surement, where the cost of one operation is proportional to the length of the 
representation of the operands. 

The (worst case) time complexity of an algorithm A is a function TimeA(n) 
of the input size. In Tzme^(n) complexity A computes the output to each of the 
inputs of size n and there is an input of size n on which A runs exactly with 
TimeA(n) complexity. 

The class P is the class of all languages that can be decided in polynomial time. 
Every decision problem in P is considered to be tractable (practically solvable). The 
class P is an invariant of the choice of any reasonable model of computation. 

The class NP is the class of all languages that are accepted by polynomial- 
time nondeterministic algorithms, or, equivalently, by (deterministic) polynomial- 
time verifiers. One conjectures that P C NP, because if P would be equal to 
NP, then to find a solution would be as hard as to verify the correctness of a 
given solution for many mathematical problems. To prove or to disprove P C NP 
is one of the most challenging open problems currently in computer science and 
mathematics. 

The concept of NP-completeness provides a method for proving intractabil- 
ity (hardness) of specific decision problems, assuming P ^ NP. In fact, an NP- 
complete problem encodes any other problem from NP in some way, and, for any 
decision problem from NP, this encoding can be computed efficiently. NP-complete 
problems are considered to be hard, because if an NP-complete problem would be 
in P, then P would be equal to NP. This concept can be extended to optimization 
problems, too. 

2.3.4 Algorithm Design Techniques 

Over the years people identified several general techniques (concepts) that 
often yield efficient algorithms for large classes of problems. This chapter 
provides a short overview on the most important algorithm design techniques; 
namely, 

divide- and- conquer , 
dynamic programming, 
backtracking, 
local search, and 
greedy algorithms. 

We assume that the reader is familiar with all these techniques and knows 
many specific algorithms designed by them. Thus, we reduce our effort in this 
section to a short description of these techniques and their basic properties. 




2.3 Fundamentals of Algor it hmics 135 



Despite the fact that these techniques are so successful that when getting a 
new algorithmic problem the most reasonable approach is to look whether 
one of these techniques alone can provide an efficient solution, none of these 
techniques alone can solve NP-hard problems. Later, we shall see that these 
methods combined with some new ideas and approaches can be helpful in 
designing practical algorithms for hard problems. But this is the topic of 
subsequent chapters. 

In what follows we present the above mentioned methods for the design of 
efficient algorithms in a uniform way. For every technique, we start with its 
general description and continue with a simple illustration of its application. 

DIVIDE- AND-CONQUER. 

Informally, the divide-and-conquer technique is based on breaking the given 
input instance into several smaller input instances in such a way that from 
the solutions to the smaller input instances one can easily compute a solution 
to the original input instance. Since the solutions to the smaller problem 
instances are computed in the same way, the application of the divide-and- 
conquer principle is naturally expressed by a recursive procedure. In what 
follows, we give a more detailed description of this technique. Let U be any 
algorithmic 47 problem. 

Divide-and-Conquer Algorithm for U 

Input: An input instance I of U, with a size(I) — n, n > 1. 

Step 1: if n— 1 then compute the output to I by any method 

else continue with Step 2. 

Step 2: Using /, derive problem instances Ji,/ 2 , fc E IN — {0} of U 

such that size(Ij) — rij < n for j — 1 , 2 , . . . , k. 

{Usually, Step 2 is executed by partitioning I into 7i, 7 2j • • • > -ffc- 
/ 1 , / 2 , • • • , Ik are also called subinstances of I.} 

Step 3: Compute the output C/(/i),. . . ,{/(/*;) to the inputs (subinstances) 

/ 1 , . . . ,/fc, respectively, by recursively using the same procedure. 

Step 4: Compute the output to I from the outputs U(I \), . . . ,U(Ik )• 

In the standard case the complexity analysis results in solving a recurrence. 
Following our general schema, the time complexity of a divide-and-conquer 
algorithm A may be computed as follows: 

Time a (1) < b if Step 1 can be done in time complexity 

b for every input of U of size 1, 
k 

TimeA{n) = E Time A (rii ) + g(n) + f(n) 

i= 1 



47 



Decision problem, optimization problem, or any other problem 




136 2 Elementary Fundamentals 



where g(n) is the time complexity of partitioning of I of size n into subin- 
stances 7i, . . . , Ik (Step 2), and f(n) is the time complexity of computing U (I) 
from J7(/i), . . . ,U(h) (Step 4). 

In almost all cases Step 2 partitions I into k input subinstances of the 
same size Thus, the typical recurrence has the form 

TimeA(ri) = k • Time a 

for a nondecreasing function h and some constants k and m. How to solve 
such recurrences was discussed in Section 2.2. 

Some famous examples of the divide-and-conquer technique are binary 
search, Mergesort, and Quicksort for sorting, and Strassen’s matrix multipli- 
cation algorithm. There are numerous applications of this technique, and it 
is one of the most widely applicable algorithm design technique. Here, we 
illustrate it on the problem of long integer multiplication. 

Example (The problem of multiplying large integers) Let 

a — — i • • • and b — bfib^i — i ... b\ 

be binary representations of two integers Number(a) and Number(b ), n = 2 k 
for some positive integer k. The aim is to compute the binary representation of 
Number (a) • Number (b). Recall that the elementary school algorithm involves 
computing n partial products of a n a n _i . . . d\ by 6, for i — 1, . . . , n, and so 
its complexity is in 0(n 2 ). 

A naive divide-and-conquer approach can work as follows. One breaks each 
of a and b into two integers of n/2 bits each: 

A = Numbered) = Number(d n . . . a n / 2 +i) -2 n / 2 + Number (a n / 2 . . . ai) 

v v ' v v ' 

A x A 2 

B = Number(b) = Number (b n . . . K/ 2 +i) -2" /2 + Number (b n /2 . . . h) . 

' V ' ' V " 

B\ B 2 



n 



rri 



i + h(n) 



The product of Number(a) and Number(b) can be written as 

A-B = A 1 -B l -2 n + (Ai-B 2 + B 1 -A 2 )- 2 n/2 + A 2 ■ B 2 . (2.28) 

Designing a divide-and-conquer algorithm based on the equality (2.28) we see 
that the multiplication of two n-bit integers was reduced to 

• four multiplications of (77) -bit integers (A\ • Hi, A\ • B 2 , B\ • A 2 , A 2 • B2), 

• three additions of integers with at most 2 n bits, and 

• two shifts (multiplications by 2 n and 2 n / 2 ). 

Since these additions and shifts can be done in cn steps for some suitable con- 
stant c, the complexity of this algorithm is given by the following recurrence: 




2.3 Fundamentals of Algorithmics 137 



Tzrae(l) = 1 

Time(n) = 4 • Time + cn. (2.29) 

Following the Master Theorem, the solution of (2.29) is Time(n ) = 0(n 2 ). 
This is no improvement of the classical school method from the asymptotic 
point of view. To get an improvement one needs to decrease the number of 
subproblems, i.e., the number of multiplications of (n/2)-bit integers. This 
can be done with the following formula 



A - B = Atfi • 2" + [AiB\ + A 2 B 2 + {A 1 - A 2 ) • ( B 2 - B x )j • 2" /2 + A 2 B 2 

(2.30) 

because 



(A\ — A 2 ) • ( B 2 — B\) + A\B\ H- A 2 B 2 
— A\ B 2 — A\B\ — A 2 B 2 + A 2 B 1 + A\B\ + A 2 B 2 
— A1B2 + A2B1. 

Although (2.30) looks more complicated than (2.28), it requires only 

• three multiplications of (f )-bit integers (A\ • jE?i, ( A\ — A 2 ) • (B 2 — B\), 
and A 2 • B 2 ), 

• four additions, and two subtractions of integers of at most 2 n bits, and 

• two shifts (multiplications by 2 n and 2 n / 2 ). 

Thus, the divide-and-conquer algorithm C based on (2.30) has the time 
complexity given by the recurrence 

Timed 1) — 1 

Timec{n) = 3 • Timec 4- dn (2.31) 

for a suitable constant d. According to the Master Theorem (Section 2.2.2) 
the solution of (2.31) is Timec(n) G O (n log 2 3 ), where log 2 3 ~ 1.59. So, C is 
asymptotically faster than the school method. 48 □ 

DYNAMIC PROGRAMMING. 

The similarity between divide-and-conquer and dynamic programming is that 
both these approaches solve problems by combining the solutions to problem 
subinstances. The difference is that divide-and-conquer does it recursively 
by dividing a problem instance into subinstances and calling itself on these 
problem subinstances, 49 while dynamic programming works in a bottom-up 

48 Note that the school method is superior to C for integers with fewer than 500 
bits because the constant d is too large. 

49 Thus, divide-and-conquer is a top-down method. 




138 2 Elementary Fundamentals 



fashion as follows. It starts by computing solutions to the smallest (simplest) 
subinstances, and continues to larger and larger subinstances until the original 
problem instance has been solved. During its work, any algorithm based on 
dynamic programming stores all solutions to problem subinstances in a table. 
Thus, a dynamic-programming algorithm solves every problem subinstance 
just once because the solutions are saved and reused every time the subprob- 
lem is encountered. This is the main advantage of dynamic programming over 
divide-and-conquer. The divide-and-conquer method can result in solving ex- 
ponentially many problem subinstances despite the fact that there is only a 
polynomial number of different problem subinstances. This means that it may 
happen that a divide-and-conquer algorithm solves some subinstance several 
times. 50 

Perhaps the most transparent example for the difference between dynamic 
programming and divide-and-conquer can be seen when computing the nth 
Fibonacci number F(n). Recall that 

F(l) = F( 2) - 1 and F(n) = F(n - 1) + F[n - 2) for n > 3. 

The dynamic-programming algorithm A subsequently computes 



“F(l), F( 2), F( 3) = F( 1) + F( 2), . . . , F(n) = F(n - 1) 4- F(n - 2)”. 



Obviously, Time a is linear in the value of n. A divide-and-conquer algorithm 
DCF on the input n will recursively call DCF(n — 1) and DCF(n — 2). DCF(n — 
1) will again recursively call DCF(n — 2) and DCF(n — 3), etc. The tree in 
Figure 2.18 depicts a part of the recursive calls of DCF. One easily observes 
that the number of subproblems calls is exponential in n and so Time ^ cf is 
exponential in n. 

Exercise 2.3.4. 1 . Estimate, for every i £ {l,2,...,n — 1}, how many times 
the subproblem F{n — i) is solved by DCF. □ 

Exercise 2. 3. 4. 2 . Consider the problem of computing (£) with the formula 



n 

k 



n — 1 

k 



+ 



n — 1 
k — 1 



(2.32) 



What is the difference between the time complexities of divide-and-conquer 
and dynamic programming when both methods use the formula (2.32) to 
compute the result? (Note that the complexity has to be measured in both 
input parameters n and k.) □ 

Some of the well-known applications of dynamic programming are the 
Floyd algorithm for the shortest path problem, the algorithm for the minimum 

50 Note that it sometimes happens that the natural way of dividing an input instance 
leads to overlapping subinstances and so the number of different subinstances may 
even be exponential. 




2.3 Fundamentals of Algorithmics 139 




Fig. 2.18. 

triangulation problem, the optimal merge pattern algorithm, and the pseudo- 
polynomial algorithm for the knapsack problem (presented in Chapter 3) . We 
illustrate the method of dynamic programming with the Floyd algorithm. 

Example The all-pairs shortest path problem is to find the costs of the short- 
est path between any pair of vertices of a given weighted graph (G, c), where 
G = (V,E) and c : E — > IN — {0}. Let V = • • • ,v n }. 

The idea is to consecutively compute the values 

Costk(i,j ) = the cost of the shortest path between v\ and Vj 
whose internal vertices are from {ui, V 2 , . . . , v k } 

for k = 0, 1, . . . , n. At the beginning one sets 

{ c({vi,Vj }) if {vi,Vj} € E, 
oo if {vi,Vj} £ E and i ^ j 

0 if i=j 

To compute Costk{i,j) from Costk-i{r, s)’s for r, s e {1, . . . , n} one can use 
the formula 

Cost k (i,j) = imn{Costk-i(iJ), Costk-i(i,k) + Cost k -i(k,j)} 

for all i,j €{1,2,..., n}. This is because the shortest path between Vi and Vj 
going via the vertices from {ui, . . . , v k } only, either does not go via the vertex 
v k or if it goes via then it visits v k exactly once. 51 



51 



The shortest path does not contain cycles because all costs on edges are positive. 




140 2 Elementary Fundamentals 



The following algorithm is the straightforward implementation of this 
strategy. 

FLOYD’S ALGORITHM 

Input: A graph G = (V,E), V = {ui,...,u n } F n e IN - {0}, and a cost 

function c : E — » IN — {0}. 

Step 1: for z = 1 to n do 

do begin Cost[i,i] := 0; 

for j := 1 to n do 

if {vi,Vj} e E then Cost[i,j] := c({u*,Uj}) 
else if i ^ j then Cost[i,j] := oo 

end 

Step 2: for k := 1 to n do 

for z := 1 to n do 
for j := 1 to n do 

CW[z,j] := min{C'os£[z, j], Cost[i,k] + Cost[k,j]}. 
Obviously, the complexity of the Floyd’s algorithm is in 0(n 3 ). □ 

BACKTRACKING. 

Backtracking is a method for solving optimization problems by a possibly 
exhaustive search of the set of all feasible solutions or for determining an 
optimal strategy in a finite game by a search in the set of all configurations 
of the game. Here, we are interested only in the application of backtracking 
for optimization problems. 

In order to be able to apply the backtrack method for an optimization 
problem one needs to introduce some structure in the set of all feasible solu- 
tions. If the specification of every feasible solution can be viewed as an n-tuple 
(PhP 2 , • • • ,p n )j where every pi can be chosen from a finite set P*, then the 
following way brings a structure into the set of all feasible solutions. 

Let Ai{x) be the set of all feasible solutions to the input instance x of an 
optimization problem. 

We define as a labeled rooted tree with the following properties: 

(i) Every vertex v of Tjm( x ) is labeled by a set S v C M(x). 

(ii) The root of T^ x ) is labeled by M.(x). 

(iii) If vi, . . . , v m are all sons of a father v in Tm(x), then S v = S v% and 
S Vt D S Vj = 0 for i ± j. 

{The sets corresponding to the sons define a partition of the set of their 
father.} 

(iv) For every leaf u of T M ^, ISJ < 1. 

{The leaves correspond to the feasible solutions of M(x).} 




2.3 Fundamentals of Algorithmics 141 



If every feasible solution can be specified as described above, one can start 
to build Tyv i( x ) by setting pi = a. Then the left son 52 of the root corresponds 
to the set of feasible solutions with pi = a and the right son of the root 
corresponds to the set of feasible solutions with p\ ^ a. Continuing with 
this strategy the tree T M ( X ) can be constructed in the straightforward way. 53 
Having the tree T M ( x ) the backtrack method is nothing else than a search (the 
depth-first-search or the breadth-first-search) in T^ x y In fact, one does not 
implement backtracking as a two-phase algorithm, where T M ^ is created in 
the first phase, and the second phase is the depth-first-search in T M ( x y This 
approach would require a too large memory. The tree T M ( X ) is considered only 
hypothetically and one starts the depth-first-search in it directly. Thus, it is 
sufficient to save the path from the root to the actual vertex only. 

In the following example we illustrate the backtrack method on the TSP 
problem. 

Example 2. 3.4. 3 (Backtracking for TSP). Let x = (G, c), G = (V,E), 
V = {ui,v 2 ,...,u n }, E = {eij \i,j E {l,...,n},i ^ j} be an input instance 
of TSP. Any feasible solution (a Hamiltonian tour) to (G, c) can be unambigu- 
ously specified by an (n - l)-tuple ({^i,^}, ^n-i }) G 

E n ~ x , where {1, i\, z 2 , . . . ,i n -i} = {1,2, . . . ,n} (i.e., , • . . ,v in _ 1 ,vi 

is the Hamiltonian cycle that consists of edges eu^ e^, . . . , ei n _ 2 i n _ 1 ,ei n _ 1 1 . 
Let, in what follows, S x (h i, . . . , h r , fci, . . . , k s ) denote the subset of feasi- 
ble solutions containing the edges hi , . . . , h r and not containing any of the 
edges ki,...,k 8 , r, s G IN. T M ^ can be created by dividing M{x) into 
S x {e 12 } C M{x) that consists of all feasible solutions that contain the edge 
ei 2 and S x (ei 2 ) consisting of all Hamiltonian tours that do not contain the 
edge e i2 , etc. Next, we construct T M ( X ) for the input instance x = (G,c). 
T M ( x ) is depicted in Figure 2.20. We set 5 X (0) — M{x). 




Fig. 2.19. 



Let us show that all leaves of T M ^ X ) are one-element subsets of M(x). 
S x (e i 2 ,e 23 ) = {(ei 2 ,e 2 3 ,e 3 4 ,e 4 i)} because the only possibility of closing a 

52 T M ( x ) does not necessarily need to be a binary tree, i.e., one may use another 
strategy to create it. 

53 A specific construction is done in the following example. 




142 2 Elementary Fundamentals 




Fig. 2.20. 



Hamiltonian tour starting with fi,t>2,^3 is to continue with v 4 and then 
finally with v\. The cost of this tour is 12. Every Hamiltonian tour that 
contains ei2 and does not contain e23 must contain e24- So, S x (ei2,e23) = 
{ei2 , 624, 643, e3i}, and the corresponding Hamiltonian tour fi, t>2, ^4, ^3, v\ 
has cost 7. S x (ei2) = {(ei3, e23, 624, 614)} = 5 x {ei3,ei 4 } because if ei 2 is not 
in a Hamiltonian tour, then ei 3 and ei4 must be in this tour. 54 Thus, v 4 , vi,vs 
(or vs,v\,v 4 ) must be part of the Hamiltonian tour. Because only the vertex 
V2 is missing, one has to take edge e24 and e23 and the resulting Hamiltonian 
tour is v 4 ,vi,V3,V2,v 4 = ^3, V2, t>4, vi- Its cost is 15. □ 

We shall present a more complex example in Section 3.4, where the use of 
the backtrack method for solving hard problems will be discussed. 

One could ask why we do not simply use the brute force approach by enu- 
merating all |Pi| • |P 2 | |P n -i| [(n — l) n_1 many in the case of TSP] tuples 

(pi , . . . , p n - 1) and to look at the costs of them that specify a Hamiltonian tour. 
There are at least two possible advantages of backtracking over the brute force 
method. First of all we generate only feasible solutions by backtracking and so 

their number may be essentially smaller than |Pi| |P n _i|. For instance, 

we have at most (n — l)!/2 Hamiltonian tours in a graph of n vertices but 
the brute force method would generate (n — l)( n_1 ) tuples candidating for a 
feasible solution. The main advantage is that we do not necessarily need to 
realize a complete search in T^ x y If we find a feasible solution a with a cost 
m, and if we can calculate in some vertex v of Tm{x) that all solutions corre- 
sponding to the subtree T v rooted by v cannot be better than a (their costs 
are larger than m if one considers a minimization problem) , then we can omit 
the search in T v . If one is lucky, the search in many subtrees may be omitted 
and the backtrack method becomes more efficient. But we stop the discussion 
on this topic here, because making backtracking more efficient in the applica- 



54 



Note that, for every vertex v of G, there are two edges adjacent to v in any 
Hamiltonian tour of G. 




2.3 Fundamentals of Algorithmics 143 

tions for hard optimization problems is the main topic of Section 3.4 in the 
next chapter. 

LOCAL SEARCH. 

Local search is an algorithm design technique for optimization problems. In 
contrast to backtracking we do not try to execute an exhaustive search of the 
set of feasible solutions, but rather a restricted search in the set of feasible 
solutions. If one wants to design a local search algorithm it is necessary to 
start by defining a structure on the set of all feasible solutions. The usual 
way to do it is to say that two feasible solutions a and (3 are neighbors if a 
can be obtained from (3 and vice versa by some local (small) change of their 
specification. Such local changes are usually called local transformations. 
For instance, two Hamiltonian tours Hi and are neighbors if one can obtain 

iJ 2 from Hi by exchanging two edges of Hi for another two edges. Obviously, 
any neighborhood on M.(x) is a relation on M{x). One can view M(x) as a 
graph G{M{x)) whose vertices are feasible solutions, and two solutions a and 
/ 3 are connected by the edge {a, (3} if and only if a and (3 are neighbors. The 
local search is nothing else but a search in G(M(x)) where one moves via an 
edge {a,/3} from a to (3 only if cost ((3) > cost(a) for maximization problems 
and cost(P) < cost (a) for minimization problems. Thus, local search can be 
viewed as an iterative improvement that halts with a feasible solution that 
cannot be improved by the movement to any of its neighbors. In this sense 
the produced outputs of local search are local optima of A4(x). 

If the neighborhood is defined, one can briefly describe local search algo- 
rithms by the following scheme. 

Input: An input instance x. 

Step 1: Start with a (randomly chosen) feasible solution a from 

Step 2: Replace a with a neighbor of a whose cost is an improvement in the 
comparison with the cost of a. 

Step 3: Repeat Step 2 until a solution a is reached such that no neighbor of 

a can be viewed as a better solution than a. 

Output: a. 

The local search technique is usually very efficient and it is one of the 
most popular methods for attacking hard optimization problems. Besides this, 
some successful heuristics like simulated annealing are based on this technique. 
Because of this we postpone deeper discussion and analysis of the local search 
technique to Chapters 3 and 6, which focus on the topic of this book. We 
finish the presentation of this method here with a simple application example. 

Example 2. 3.4.4. A local search algorithm for the minimum span- 
ning tree problem 

The minimum spanning tree problem is to find, for a given weighted 
graph G = (V,E, c), a tree T = (V,E'), E f C E, with minimal cost. Let 




144 2 Elementary Fundamentals 



Ti = ( V, E \ ) and T 2 = (V,^) be two spanning trees of G. We say that Ti 
and T 2 are neighbors if T± can be obtained from X 2 by exchanging one edge 
(i.e. , \Ei — E 2 \ = \E 2 — Ei\ = 1). Observe that in looking for neighbors one 
cannot exchange arbitrary edges because this can destroy the connectivity. 
The simplest way to do a consistent transformation is to add a new edge e to 
T\ — (V, Ei). After this we obtain the graph 

(^lU {e} ) 

that contains exactly one cycle since T\ is a spanning tree. If one removes an 
edge h of the cycle with c(h) > c(e), then one obtains a better spanning tree 

of G which is a neighbor of T\ . 




Fig. 2.21. 



Consider the graph G = (V, E, c) depicted in Figure 2.19. We start the local 
search with the tree depicted in Figure 2.21a. Its cost is 10. Let us add the edge 
{^ 2 ,^ 4 } first. Then, we get the cycle ^ 1 ,^ 2 , ^ 4,^1 (Figure 2.21b). Removing 
the most expensive edge {^ 1 ,^ 4 } of this cycle we get the spanning tree in 
Figure 2.21c with cost 5. Now, one adds the edge {^ 2 ,^ 3 } (Figure 2.21d). 
Removing {^ 2 , ^ 4 } from the cycle U 2 , ^ 3 , ^ 4 , ^2 we obtain an optimal spanning 
tree (Figure 2.21e) whose cost is 4. The solution can be also obtained in one 
iterative step if one would add the edge {^ 2 , ^ 3 } in the first iterative step and 
then remove the edge {^ 1 ,^ 4 } (Figure 2.21f). □ 

Exercise 2. 3. 4. 5. Prove that the local search algorithm for the minimum 
spanning tree always computes an optimal solution in time O (\E\ 2 ). □ 




2.3 Fundamentals of Algorithmics 145 



GREEDY ALGORITHMS. 

The greedy method is perhaps the most straightforward algorithm design 
technique for optimization problems. A similarity to backtracking and to lo- 
cal algorithms is in that one needs a specification of a feasible solution by a 
tuple (pi,p2j • • • ,Pn)j Pi € Pi for i = 1, . . . , n, and that any greedy algorithm 
can be viewed as a sequence of local steps. But the greedy algorithms do 
not move from one feasible solution to another feasible solution. They start 
with an empty specification and fix one local parameter of the specification 
(for instance, P 2 ) forever. In the second step a local algorithm fixes a second 
parameter (for instance, pi) of the specification and so on until a complete 
specification of a feasible solution is reached. The name greedy comes from 
the way in which the decisions about local specifications are done. A greedy 
algorithm chooses the parameter that seems to be most promising from all 
possibilities to make the next local specification. It never reconsiders its deci- 
sion, whatever situation may arise later. For instance, a greedy algorithm for 
TSP starts by deciding that the cheapest edge must be in the solution. This is 
locally the best choice if one has to specify only one edge of a feasible solution. 
In the next steps it always adds the cheapest new edge that can, together with 
the already fixed edges, form a Hamiltonian tour to the specification. 

Another point about greedy algorithms is that they realize exactly one 
path from the root to a leaf in the tree T M ^ created by backtracking. In fact 
an empty specification means that one considers the set of all feasible solutions 
M.{x) and M(x) is the label of the root of the tree T M ( x y Specifying the first 
parameter p\ corresponds to restricting A4 to a set S(pi) = {a G M(x) | the 
first parameter of the specification of a is pi}. Continuing this procedure we 
obtain the sequence of sets of feasible solutions 

M{x) 2 S(pi) 2 S(pi,p 2 ) 2 S(pi,p 2 ,p 3 ) 2 • • • 2 S(p x ,p 2 , . . . ,p n ), 
where \S(pi,p 2 , . . . ,p n )\ = 1. 

Following the above consideration we see that greedy algorithms are usu- 
ally very efficient, especially in the comparison with backtracking. Another 
advantage of the greedy method is that it is easy to invent and easy to im- 
plement. The drawback of the greedy method is that too many optimization 
problems are too complex to be solved by such a naive strategy. On the other 
hand even this simple approach can be helpful in attacking hard problems. 
Examples documenting this fact will be presented in Chapter 4. The following 
two examples show that the greedy method can efficiently solve the minimum 
spanning tree problem and that it can be very weak for TSP. 

Example 2. 3. 4. 6 (Greedy for the minimum spanning tree problem). 

The greedy algorithm for the minimum spanning tree problem can be simply 
described as follows. 




146 2 Elementary Fundamentals 



GREEDY-MST 

Input: A weighted connected graph G = (V,E,c), c : E — > IN — {0}. 

Step 1: Sort the edges according to their costs. Let ei,e 2 ,...,e m be the 

sequence of all edges of E such that c(ei) < c(e 2 ) < • • • < c(e m ). 
Step 2: Set E' := {ei,e 2 }; I := 3; 

Step 3: while |£7'| < \V\ - 1 do 

begin add ej to E' if (V, E f U {e/}) does not contain any cycle; 
/:=/ + 1 
end 

Output: (V,E f ). 





Fig. 2.22. 



Since \E'\ = \V\ — 1 and (V,E') does not contain any cycle, it is obvious 
that (V,E') is a spanning tree of G. To prove that (V,E f ) is an optimal 
spanning tree is left to the reader. 

We illustrate the work of Greedy-MST on the input instance G depicted 
in Figure 2.19. Let {vi,v 2 }, {«3,w 4 }> {vi,u 3 }, {v 2 ,v z }, {t>2,M> {v\,v A } be the 
sequence of the edges after sorting in Step 1. Then Figure 2.22 depicts the steps 
of the specification of the optimal solution (V, {{^ 1 ,^ 2 }, {^ 3 ,^ 4 }, {^ 1 ,^ 3 }})- 

□ 

Exercise 2. 3.4. 7. Prove that Greedy-MST always computes an optimal 
solution. □ 

Example 2. 3.4.8 (Greedy for TSP). The greedy algorithm for TSP can be 
described as follows. 

GREEDY-TSP 

Input: A weighted complete graph G = (V,E,c) with c : E — > IN — {0}, 

\V\ = n for some positive integer n. 

Step 1: Sort the costs of the edges. Let ei, e 2 , . . . , e ^ be the sequence of 

all edges of G such that c(ei) < c(e 2 ) < • • • < c(e^). 

Step 2: E' — {ei,e 2 }, I := 3; 




2.3 Fundamentals of Algorithmics 147 

Step 3: while |J57'| < ri do 

begin add {e/} to E f if (V, E f U {e/}) does not contain any 
vertex of degree greater than 2 and any cycle of length 
shorter than n; 

/:=/+ 1 ; 

end 

Output: (V,E f ). 



Vl 1 V2 




Since \E'\ = n and no vertex has degree greater than 2, (F, E') is a Hamil- 
tonian tour. Considering the graph partially depicted in Figure 2.23 we see 
that Greedy-TSP may produce solutions whose costs are arbitrarily far from 
the optimal cost. 

Let the edges { Vi , Vi+ 1} for i = 1, 2, . . . , 5 and {vi^vq} have the costs as 
depicted in Figure 2.23 and let all missing edges have the cost 2. Assume 
that a is a very large number. Obviously, Greedy-TSP takes first all edges 
of cost 1. Then the Hamiltonian tour v\, V2, V3, V4, v&, ve, v\ is unambiguously 
determined and its cost is a -b 5. Figure 2.24 presents an optimal solution 
vi, V4, ^3, V 2 , U5, vq, vi whose cost is 8. 

Since one can choose an arbitrarily large a, the difference between a T 5 
and 8 can be arbitrarily large. □ 



Vl V2 




Fig. 2.24. 




148 2 Elementary Fundamentals 

Keywords introduced in Section 2.3.4 

divide-and-conquer, dynamic programming, backtracking, local search, greedy algo- 
rithms 

Summary of Section 2.3.4 

Divide-and-conquer, dynamic programming, backtracking, local search, and greedy 
algorithms are fundamental algorithm design techniques. These techniques are ro- 
bust and paradigmatic in the sense that, when getting a new algorithm problem, 
the most reasonable approach is to look whether one of these techniques alone can 
provide an efficient solution. 

Divide-and-conquer is a recursive technique based on breaking the given problem 
instance into several problem subinstances in such a way that from the solution to 
the smaller problem instances one can easily compute a solution to the original 
problem instance. 

Dynamic programming is similar to the divide-and-conquer method in the sense 
that both techniques solve problems by combining the solutions to subproblems. 
The difference is that divide-and-conquer does it recursively by dividing problem 
instances into subinstances and calling itself on these subinstances, while dynamic 
programming works in a bottom-up fashion by starting with computing solutions 
to smallest subinstances and continuing to larger and larger subinstances until the 
original problem instance is solved. The main advantage of dynamic programming is 
that it solves every subinstance exactly once, while divide-and-conquer may compute 
a solution to the same subinstance many times. 

Backtracking is a technique for solving optimization problems by a possibly 
exhaustive search of the set of all feasible solutions, in such a systematic way that 
one never looks twice at the same feasible solution. 

Local search is an algorithm design technique for optimization problems. The 
idea is to define a neighborhood in the set of all feasible solutions M(x) and then 
to search in M(x) going from a feasible solution to a neighboring feasible solution if 
the cost of the neighboring solution is better than the cost of the original solution. 
A local search algorithm stops with a feasible solution that is a local optimum 
according to the defined neighborhood. 

Greedy method is based on a sequence of steps, where in every step the algorithm 
specifies one parameter of a feasible solution. The name greedy comes from the way 
it chooses of the parameters. Always, the most promising choice from all possibilities 
is taken to specify the next parameter, and no decision is reconsidered later. 




3 



Deterministic Approaches 




”As gravity is the essential quality of matter, 
freedom is the basic quality of the spirit. 
Human freedom is, first of all, 
the freedom to perform creative work. ” 

Georg Wilhelm Friedrich Hegel 



3.1 Introduction 

In Section 2.3.3 we learned that one does not have any chance to use algorithms 
of exponential complexity on large input instances because the execution of 
many, for instance 2 100 , elementary operations lies beyond the physical reality. 
Assuming P ^ NP, there is no possibility to design polynomial-time (deter- 
ministic) algorithms for solving NP-hard problems. The question is what can 
be done if one wants to attack NP-hard problems with deterministic algo- 
rithms in practice. In this chapter we consider the following three approaches: 

• The first approach 

We try to design algorithms for solving hard problems and we accept their 
(worst case) exponential complexity if they are efficient and fast enough 
for most of the problem instances appearing in the specific applications 
considered. Observe that the success of this approach is possible because 
we have defined complexity as the worst case complexity and so there may 
exist many input instances of the given hard problem that can be solved 
fast by a suitable algorithm. One can roughly view this approach as an 
effort to partition the set of all instances of a hard problem into two subsets, 
where one subset contains the easy instances and the other one contains the 
hard instances. 1 If the problem instances appearing in a specific application 

1 In fact, it is not realistic to search for a clear border between easy problem in- 
stances and hard problem instances of an algorithmic problem. Realistically we 
search for large subclasses of easy problem instances only, and we look whether 
typical problem instances in the considered application belong to these easy sub- 
classes. 




150 3 Deterministic Approaches 



are typically (or even always) in the subset of easy problem instances, we 
have been successful in solving the particular practical task. 

• The second approach 

We design exponential algorithms for hard problems, and we even accept 
if their average 2 time complexity is exponential. The crucial idea is that 
one tries to design algorithms with “slowly increasing” worst case expo- 
nential time complexity. For instance, one can design an algorithm A with 
TimeA{n ) = (1.2) n or TimeA{n) = 2 v/ ™ instead of a straightforward al- 
gorithm of time complexity 2 n . Clearly, this approach moves the border 
of tractability because one can successfully use 2^ and (1.2) n -algorithms 
for problem instances of sizes for which an 2 n -algorithm does not have any 
chance of finding a solution in a reasonable time. 

• The third approach 

We remove the requirement that our algorithm has to solve the given 
problem. In the case of optimization problems one can be satisfied with 
a feasible solution that does not need to be optimal if it has some other 
reasonable properties (for instance, its cost is above some threshold or not 
too far away from the optimal cost). The typical representative algorithms 
of this approach are approximation algorithms that provide solutions with 
costs “close” to the cost of the optimal solutions. Because of the huge suc- 
cess of approximation algorithms for solving hard optimization problems 
we devote all of Chapter 4 to them and so we do not discuss them here. 3 
Sometimes one may essentially relax about requirements and be satisfied 
only with practical information about the solution. An example is to com- 
pute a bound on the cost of an optimal solution instead of searching for an 
optimal solution. Such information may be very helpful in trying to solve 
specific problem instances using other approaches and so algorithms based 
on this approach are often used as a precomputation for other algorithms. 

Note that one can combine several different approaches in order to design 
an algorithm for a given hard problem. The ideas presented in the above 
approaches lead to the development of several concepts and methods for the 
design of algorithms for hard problems. In this chapter we present the following 
ones: 

(i) Pseudo-polynomial-time algorithms, 

(ii) Parameterized complexity, 

(iii) Branch-and-Bound, 

(iv) Lowering the worst case exponential complexity, 

(v) Local search, 

(vi) Relaxation to linear programming. 

2 The average is considered over all inputs. 

3 Despite the fact that they belong to the deterministic approaches for solving hard 
problems. 




3.1 Introduction 



151 



All these concepts and design methods are of fundamental importance in 
the same sense as the algorithm design techniques reviewed in Section 2.3.4. 
If one has a hard problem in a specific application it is reasonable to look 
first whether some of the above concepts and methods can provide reasonable 
solutions. 

One should be careful to distinguish between concepts and algorithm de- 
sign techniques here. Branch-and-bound, local search, and relaxation to lin- 
ear programming are robust algorithm design techniques such as divide-and- 
conquer, dynamic programming, etc. Pseudo-polynomial-time algorithms, pa- 
rameterized complexity, and lowering the worst case exponential complexity, 
on the other hand, are concepts 4 providing ideas and frameworks for attacking 
hard algorithmic problems. To use these concepts one usually needs to apply 
some of the algorithm design techniques (alone or in combination) presented 
in Section 2.3.4. 

This chapter is organized as follows. Section 3.2 is devoted to pseudo- 
polynomial-time algorithms, whose concept is based on the first approach. 
Here we consider algorithmic problems whose input instances consist of a 
collection of integers. A pseudo-polynomial-time algorithm is a (potentially 
exponential-time) algorithm whose time complexity is polynomial in the num- 
ber of input integers n and in the values of the input integers. Since the value of 
an integer is exponential in its binary representation, pseudo-polynomial-time 
algorithms are generally exponential in input size. But pseudo-polynomial- 
time algorithms work in polynomial time on problem instances, where the 
values of the input integers are polynomial in the number of the input inte- 
gers. Thus, such problem instances may be considered easy and one can solve 
the problem for them. 

Section 3.3 is devoted to the concept of parameterized complexity, which 
is a more general application of the idea of the first approach than the case of 
pseudo-polynomial-time algorithms. Here, one tries to partition the set of all 
problem instances into possibly infinitely many subsets according to the value 
of some input parameter (characteristic) and to design an algorithm that is 
polynomial in the input length but not in the value of this parameter. For 
instance, an algorithm can have the time complexity 2 k • n, where n is the 
input size and k is the value of the parameter. Obviously, for small k's this 
algorithm is very efficient, but for k = n it is exponential. Such an algorithm 
may be very successful if one can reasonably bound k for the problem instances 
appearing in the specific application. 

The branch-and-bound method for optimization problems is presented in 
Section 3.4. It may be viewed as a combination of the first approach and 
the third approach in that one tries to make backtracking more efficient due 
to some additional information about the cost of an optimal solution. This 
additional information can be obtained by some precomputation based on the 
third approach. 

4 That is, no design techniques 




152 3 Deterministic Approaches 



The concept of lowering the worst case exponential complexity presented 
in Section 3.5 is the pure application of the second approach. Thus, one does 
not relax any requirement in the formulation of the hard problem. The effort 
to increase the tractable size of problem instances is based on the design of 
algorithms with an exponential complexity that does not take too large values 
for not too large input sizes. 

Section 3.6 is devoted to local search algorithms for solving optimization 
problems. Local algorithms always produce local optima and so they represent 
a special application of the third approach. In this section we deepen our 
knowledge about the local search method, whose elementary fundamentals 
were presented in Section 2.3.4. 

Section 3.7 is devoted to the method of relaxation to linear programming. 
The basic ideas are that many optimization problems can be efficiently re- 
duced to 0/1-linear programming or integer programming, and that linear 
programming can be solved efficiently while 0/1- and integer programming 
are NP-hard. But the difference between integer programming and linear pro- 
gramming is only in the domains and so one can relax the problem of solving 
an instance of integer programming to solving it as an instance of linear pro- 
gramming. Obviously, the resulting solution needs not be a feasible solution of 
the original problem; rather, it provides some information about an optimal 
solution to the original problem. For instance, it provides a bound on the cost 
of optimal solutions. The information obtained by the relaxation method can 
be used to solve the original problem by another approach. 

Sections 3.2, 3.3, 3.4, 3.5, 3.6, and 3.7 are presented in a uniform way. In 
the first subsection the basic concept of the corresponding algorithm design 
method is explained and formalized, if necessary. The second subsection illus- 
trates the concepts by designing some algorithms for specific hard problems. 
In the last subsection, if there is one, the limits of the applicability of the 
corresponding concepts are discussed. 

This chapter finishes with bibliographical remarks in Section 3.8. The main 
goal of this section is not only to give an overview of the history of the devel- 
opment of the concepts presented, but mainly to provide information about 
materials deepening the knowledge presented here. Once again we call atten- 
tion to the fact that in this introductory material we present simple examples 
transparently illustrating the basic concepts and ideas rather than the best 
known technical algorithms based on these concepts. Thus, reading additional 
material is a necessary condition to become an expert in the application of 
the presented concepts. 

3.2 Pseudo-Polynomial-Time Algorithms 

3.2.1 Basic Concept 

In this section we consider algorithmic problems whose inputs can be viewed as 
a collection of integers. Such problems are sometimes called integer- valued 




3.2 Pseudo-Polynomial-Time Algorithms 153 



problems. In what follows we fix the coding of inputs to words over {0, 1, #}, 
where x = xi#X 2 # • • • #x n , £ {0, 1}* for i = 1, 2, . . . , n, is interpreted as 
a vector of n integers 

Int(x) = (Number(x i), Number{x 2 ), • • • , Number(x n )). 

Obviously, problems such as TSP, the knapsack problem, integer program- 
ming, and the vertex-cover problem, can be viewed as integer- valued prob- 
lems. 5 The size of any input x E {0,1,#}* is considered to be the length \x\ 
of x as a word. 6 Obviously, if Int(x) is a vector of n integers, then n < \x\. 
Here, we are still interested in the following characteristic of the input size. 

For every x E {0, 1, #}*, x = Xi# • • • #x n , x* E {0, 1}* for i = 1, . . . , n, 
we define 



Max-Int(x) = ma x{Number(xi) \ i = 1, 2, . . . , n}. 

The main idea of the concept of pseudo-polynomial-time algorithms is to 
design algorithms that are efficient for input instances x with a not too large 
Max-Int(x) with respect to \x\. 

Definition 3.2. 1.1. Let U be an integer- valued problem, and let A be an al- 
gorithm that solves U . We say that A is a pseudo-polynomial-time algo- 
rithm for U if there exists a polynomial p of two variables such that 

TimeA(x) = 0(p(\x\, Max-Int(x))) 

for every instance x of U . 

We immediately observe that, for input instances x with Max-Int(x) < 
h(\x\) for a polynomial h , TimeA{x) of a pseudo-polynomial-time algorithm 
A is bounded by a polynomial. This can be formally expressed as follows. 

Definition 3.2. 1.2. Let U be an integer-valued problem, and let h be a non- 
decreasing function from IN to IN. The h- value-bounded subproblem of 
U, Value(h)-U, is the problem obtained from U by restricting the set of all 
input instances oflJ to the set of input instances x with Max-Int(x) < h(\x\). 

Theorem 3.2. 1.3. LetU be an integer- valued problem, and let A be a pseudo- 
polynomial-time algorithm for U . Then, for every polynomial h, there exists a 
polynomial-time algorithm for Value(h)-U (i.e., ifU is a decision problem then 
Value(h)-U E P, and ifU is an optimization problem then Value{h)-U E PO# 

Proof Since A is a pseudo-polynomial-time algorithm for {7, there exists a 
polynomial p of two variables such that 

5 In fact, every problem whose input instances are (weighted) graphs is an integer- 
valued problem. 

6 We prefer the precise measurement of the input size here. 




154 3 Deterministic Approaches 



TimeA{x) < 0(p(\x\, Max-Int(x))) 

for every input instance x of U. Since Max-Int(x) G 0 ( \x\ c ) for every input 
instance x of Value(h)-U with h(n) G 0(n c ), for some positive integer con- 
stant c, A is a polynomial-time algorithm for Value(h)-U. □ 

The concept of pseudo-polynomial-time algorithms follows the first ap- 
proach that proposes to attack hard problems by searching for large subclasses 
of easy problem instances. Observe that pseudo-polynomial-time algorithms 
can be practical in many applications. For instance, for optimization problems 
whose input instance are weighted graphs, it often happens that the weights 
are chosen from a fixed interval of values, i.e., the values are independent of 
the input size. In such cases pseudo-polynomial-time algorithms may be very 
fast. 

In the following Section 3.2.2 we use the method of dynamic programming 
to design a pseudo-poly nomial-time algorithm for the knapsack problem. Sec- 
tion 3.2.4 is devoted to a discussion about the limits of applicability of the 
concept of pseudo-polynomial-time algorithms. There, we present a simple 
method that enables us to prove the nonexistence of pseudo-polynomial-time 
algorithms for some hard problems, unless P = NP. 



3.2.2 Dynamic Programming and Knapsack Problem 

Remember that the instances of the knapsack problem (KP) are sequences 
of 2n + 1 integers (wi,W 2 , • . . ,w n , ci,C 2 , . . . , c n , 6), n G IN - {0}, where b 
is the weight capacity of the knapsack, Wi is the weight of the ith object 
and Ci is the cost of the zth object for i = 1,2, ... ,n. The objective is to 
maximize the common cost of objects (the profit) packed into the knapsack 
under the constraint that the common weight of the objects in the knapsack is 
not above b. Any solution to an input instance I = (w i, . . . , w ni ci, . . . , c n , b) 
of KP can be represented as a set T C {1,2, ...,n} of indices such that 
^2ieT w i — b an d we shall use this representation in what follows. Note that 
cost (T, I) = Yhier c i- ® ur a ™ to s °l ve KP by the method of dynamic 
programming. To do this we do not need to consider all 2 n problem subin- 
stances 7 of a problem instance I — (w ±, . . . , w n , ci, . . . , c n , 6), but only the 
problem subinstances = (uq, • • • , Wi, ci, C 2 , . . . , c*, b) for i = 1, 2, . . . , n. 
More precisely, the idea is to compute for every /*, i = 1, 2, . . . , n, and every 
integer /c G {0, 1,2,..., c j}> a triple (if it exists) 



(k,Wi tk ,T itk ) G 



0 , 1 , 2 ,...,^^ 

3 = 1 



x {0,1, 2,..., b} x Pot({l,...,i}), 



7 Note that the main art of applying dynamic programming is to find a small 
subset of input subinstances whose solutions are sufficient to efficiently compute 
a solution to the original problem instance. 




3.2 Pseudo-Polynomial-Time Algorithms 155 



where W^k < b is the minimal weight with which one can achieve exactly the 
profit k for the input instance /*, and C { 1 , 2, . . . , z} is a set of indices 
that provides the profit k under the weight i.e. , 

Cj = k and ^ Wj = PP^. 

j£T t ,k 

Note that there may be several sets of indices satisfying the above conditions. 
In such a case it does not matter which one we choose. On the other hand, 
it may happen that the profit k is not achievable for /*. In such a case we 
do not produce any triple for k. In what follows TRIPLE i denotes the set of 
all triples produced for /*. Observe that | TRIPLE i\ < Yl]=i c j + 1 and that 
| TRIPLE i | is exactly the number of achievable profits of I{. 

Example 3. 2 . 2 . 1 . Consider the problem instance 

I = (wi,...,W5,Ci,...,C 5 ,6), 

where w\ = 23, c\ = 33, u >2 — 15, C 2 = 23, ws = 15, cs = 11, W 4 — 33, 
C 4 = 35, w$ = 32, C 5 = 11, and b = 65. Thus, ii = {w\ = 23, c\ = 33 , 6 = 65). 
The only achievable profits are 0 and 33 and so 

TRIPLE 1 = {(0, 0, 0), (33, 23, {1})}. 

h = (wi = 23,^2 = 15, ci = 33, C 2 = 23 ,5 = 65). The achievable profits are 
0, 23, 33, and 56, and so 

TRIPLE2 = {( 0 , 0 , 0), (23, 15, { 2 }), (33, 23, { 1 }), (56, 38, { 1 , 2 })}. 

h = (23, 15, 15, 33, 23, 11, 65). The achievable profits are 0, 11, 23, 33, 34, 44, 
56, and 67, and 

TRIPLE 3 = {(0, 0, 0), (11, 15, {3}), (23, 15, { 2 }), (33, 23, { 1 }), 

(34, 30, {2, 3}), (44, 38, {1, 3}), (56, 38, {1, 2}), 

(67, 53, {1,2, 3})}. 

For I 4 = (23, 15, 15, 33, 33, 23, 11, 35, 65), 

TRIP LEi = {(0, 0, 0), (11, 15, {3}), (23, 15, {2}), (33, 23, {1}), 

(34, 30, {2, 3}), (35, 33, {4}), (44, 38, {1, 3}), (46, 48, {3, 4}), 
(56, 38, {1, 2}), (58, 48, {2, 4}), (67, 53, (1, 2 , 3}), 

(68, 56, {1,4}), (69, 63, {2, 3, 4})}. 

Finally, for 1 = 1 ^, 

TRIPLE 5 = {(0, 0, 0), (11, 15, {3}), (22, 47, {3, 5}), (23, 15, {2}), 

(33, 23, {1}), (34, 30, {2, 3}), (35, 33, {4}), (44, 38, {1, 3}), 

(45, 62, {2, 3, 5}), (46, 48, {3, 4}), (56, 38, {1, 2}), (58, 48, {2, 4}), 
(67, 53, {1, 2, 3})}, (68, 56, {1, 4}), (69, 63, {2, 3, 4})}. 




156 3 Deterministic Approaches 



Clearly, {2, 3, 4} is an optimal solution because (69, 63, { 2 , 3, 4}) is the triple 
with the maximal profit 69 in TRIPLE 5 and TRIPLE 5 contains a triple for 
every profit achievable under the weight constraint b. □ 

Computing the set TRIPLE n for the original input instance I = I n pro- 
vides an optimal solution to I. The optimal cost Opt KF (I ) is the maximal 
achievable profit appearing in TRIPLE n and the corresponding Tn,Opt KP(I) is 
an optimal solution. 

The main point is that one can compute TRIPLE i+ i from TRIPLE i in 
time 0(| TRIPLE^). First, one computes 



SET TRIPLEi U {(fc 4- C{- j_i, Wi t k + 1 , U {i -H 1}) | 

(fc, Wi,k,Ti^k) € TRIPLEi and Wi ^ + w i+ 1 < b} 

by taking the original set and adding the (i + l)th object to the knapsack of 
every triple if possible. In this way one can get several different triples with 
the same profit. We put into TRIPLE i+\ exactly one triple from SETi+ 1 for 
every achievable profit k by choosing a triple that achieves the profit k with 
minimal weight. It does not matter which triple is chosen from SETi+i if 
several triples have the same profit k and the same weight. 

Thus, we can describe our algorithm for the knapsack problem as follows: 

Algorithm 3.2.2.2 ((DPKP)). 

Input: I = (wi,W2, • • . , w n ,ci,C2, . . . ,c n , 5 ) G (IN — {0}) 2n+1 , n a positive 

integer. 

Step 1: TRIPLE (1) := {(0, 0, 0)} U {(c u w u {1}) | if Wl < b}. 

Step 2 : for i = 1 to n — 1 do 

begin SET(i- hi) := TRIPLE (i); 
for every (k,w,T) G TRIPLE(i) do 
if w + 1 < b then 

SET(i- 1-1) := SET (%P\^{j\^(k-\-Ci-\-\, w-\-Wi+i , TUli+l})}; 
Set TRIPLE as a subset of SET(i + 1) containing exactly 
one triple (m, u/, T') for every achievable profit m in SET(i + 1) 
by choosing a triple with the minimal weight for the given m 
end 

Step 3: Compute c := max{fc G {1, . . . , c i\ I (&? T) C TRIPLE(n) 

for some w and T}. 

Output: The index set T such that (c, w,T) G TRIPLE(n). 

Example 3.2.2. 1 provides a rough illustration of the work of the algorithm 
DPKP. Obviously, DPKP solves the knapsack problem. 

Theorem 3. 2. 2. 3. For every input instance I of KP, 

TimeBPKp(I) £ O (|/| 2 • Max-Int(I)) , 

i.e., DPKP is a pseudo-polynomial-time algorithm for KP . 




3.2 Pseudo- Polynomial-Time Algorithms 157 



Proof. The complexity of Step 1 is in 0(1). For I = (uq, W 2 , . . . , rc n , ci, . . . , c n , 
5), we have to compute n — 1 sets TRIPLE(i ) in Step 2. The computation of 
TRIPLE(i + 1) from TRIPLE(i) can be realized in 0(| TRIPLE(i + 1)|) time. 
Since | TRIPLE(i)\ < Y17 =i Ci — n ' Max-Int(I) for every i G { 1,2,..., n}, the 
complexity of Step 2 is in 0(n 2 • Max-Int(I)). The complexity of Step 3 is in 
0(n • Max-Int(I)) because one has only to find the maximum of the set of 
| TRIPLE (n)\ elements. 

Since n < |/|, the time complexity of DPKP on I is in 0(\I\ 2 • Max- Int (I)). 

□ 

We see that DPKP is an efficient algorithm for KP if the values of the 
input integers are not too large relative to the number of integers of the input 
instances. If the values are taken from some fixed interval (independent of 
the size of input instances), DPKP is a quadratic algorithm for KP. This also 
happens in several applications. If some of the values of the input instances are 
very large, one can try to make them smaller by dividing all by the same large 
integer. Because some rounding is necessary in such a procedure we can lose 
the guarantee of computing an optimal solution by applying this idea. But, 
as we shall see later in Chapter 4 on approximation algorithms, one obtains 
the guarantee of computing a reasonably good approximation of the optimal 
solution in this way. 

3.2.3 Maximum Flow Problem and Ford- Fulkerson Method 

In this section we consider the maximum flow problem which is an optimiza- 
tion problem that can be solved in polynomial time. The reason to present a 
pseudo-polynomial-time algorithm for a problem that is not really hard (at 
least not NP-hard) lies in the solution method. The Ford-Fulkerson method 
used here to solve this problem provides a base for a powerful method for 
attacking NP-hard optimization problems. The investigation of a generaliza- 
tion of the Ford-Fulkerson method will be continued in Section 3.7 which is 
devoted to the relaxation of hard problems to linear programming and to the 
LP-duality. 

In the maximum flow problem we have a network modeled by a directed 
graph (Fig. 3.1) with two special vertices - the source s and the sink t. 
The aim is to send as much material as possible via the network from the 
source to the sink. The constraints are a bounded capacity of every directed 
edge of the network (a maximum amount of material that can flow through 
the edge connection in one time unit) and the zero capacity of all vertices, 
but the source and the sink. The zero capacity of a vertex means, that the 
vertex does not have any store and so it cannot collect any material. The 
consequence is that erverything that flows into such a vertex by its ingoing 
edges must immediately leave this vertex through its outgoing edges. The 
maximum flow problem is a fundamental problem because it is a model of 
several different optimization tasks that appear in real life. For instance, it 




158 3 Deterministic Approaches 



captures the problem of submitting information (messages) from one person 
(computer) to another in a communication network, the problem of delivering 
the electric current from a power station to a user, the problem of delivering 
different kinds of liquid, or the problem of product transportation from a 
factory to a shop. 

In the subsequent definitions we give a formal specification of the maximum 
flow problem. 

Definition 3.2.3. 1. A network is a 5-tuple H = (G, c, A, s,t), where 
(%) G = (V, E) is a directed graph , 

(ii) c is a capacity function from E to A, c(e) is called the capacity of e 
for every e G E, 

(Hi) A is a subset oflR+, 

(iv) s G V is the source of the network, and 

(v) t G V is the sink of the network. 

For every v G V, the set 

In H (v) = {(u,v) | (u,v) G E} 
is the set of the ingoing edges of v and the set 

Outn(v) = {(v,u) | (v,u) G E} 
is the set of the outgoing edges of v. 

A flow function of H is any function f : E —> A such that 

(1) 0 < /(e) < c(e) for all e G E 

{the flow /(e) of every edge e is nonnegative and bounded by the capacity 
ofe), and 

(2) for all v GV — {s, t} 

E /(e) - E m = 0 

e€/nn(v) hEOutn(v) 

{for every vertex v different from the sink and from the source, the incom- 
ing flow to v is equal to the outcoming flow ofv}. 

The flow Ff of H with respect to a flow function / is defined as 

Ff= E /w- E 

hEOutn{s) e£lnn(s) 

i.e., the flow of H is measured as the amount of material that leaves the source. 

□ 

Fig. 3.2 shows the network from Fig. 3.1 with the flow function / described 
by the labelling /(e)/c(e) for every edge. The corresponding flow Ff is 6 + 
10 — 2 = 14 and this is not optimal. 




3.2 Pseudo-Poly nomial-Time Algorithms 159 

Exercise 3. 2. 3. 2. Prove, that for every network H and every flow function 

foiH 

F f = y, E /w. 

eG/njf(t) hEOutn (t) 

i.e., that Ff can be measured as the amount of material that reaches the sink 
and remains in it. 

The maximum flow problem is, for a given network H = ((V, E), c, A, s, 
t), to find a flow function / : E — > A such that the flow Ff is maximal. 





The first helpful observation we need is that one can measure the flow of 
a network H not only at the source (or the sink), but at the edges of any cut 
of H that separates the vertices t and s. 

Definition 3. 2. 3. 3. Let H = ((V, £% c, A, s, t) be a network. Let S C V be a 
set of vertices such that s G S' and t # S, and denote S = V — S . 




160 3 Deterministic Approaches 



Let 

and 



E(S , S ) = {(x, y)\x e S and y G S} D E 



E(S , S) = {( u , v) | w G S and v e S} HE. 

The cut H(S) of H with respect to S is 

H(S) = E{S, S)UE(S,S). 



□ 

Lemma 3. 2. 3. 4. Let H = ((V, E 1 ), c, A, s, t), A C IR + , a network and let 
f be a flow function of H. Then, for every S C V — {£} with s 6 S, 

F f= E_/( e )- E (3- 1 ) 

eeE(S,S) eeE(S,S ) 



Proof We prove this lemma by induction according to | *S| . 

(i) Let |5| = 1 , i.e., S = { 5 }. Then Outn(s) = E(S,S) and Iuh(s) = 
E(S , S), and so (3.1) is nothing else than the definition of Ff. 

(ii) Let (3.1) be true for all S with |5| < k < \V\ — 2. We prove (3.1) for all 
S' with \S'\ = k + 1, s G S", t S f . Obviously, we can write S f = S U {?;}, 
where |5| = k and v G S — {t}. Following the definition of the flow 
function (Property (2)), we see that the incoming flow to v is equal to the 
outcoming flow from v. This means that the move of v from S to S does 
not change anything on the flow between S and S (Fig. 3.3). To present 




the formalization of this transparent idea we first express the flow leaving 
S U {v } for S and the flow leaving S — {v} for S U {v} in terms of the flow 
between S and S. Let Ins(v) = Inn(v) fl E(S, S'), Outs{v ) = Outn(v ) n 
E(S, S), In-g(v) — Iuh(v) — Ins(v), Out ^ = Outn{v) - Outs(v). Then, 




3.2 Pseudo- Polynomial-Time Algorithms 161 



E_ /( e ) = E /( e )- E /( e )+ E /( e ) ( 3 - 2 ) 

eG£7(5U{v},5-{v}) eEE(S,S ) eEln s (v) eEOut-^(v) 

and 

_E /( e ) = E /( e )~ E /( e ) + E /( e )- 

eG^(5-{v},5U{v}) eEE(S,S ) eGOut s (v) e€/n ? (v) 

(3.3) 

Since Outn(v) — Outs(v) U Out-§(v) and Iuh(v) = Ins(v) U In-g(v), we 
have 

E /( e )+ E /( e ) = E /( e ) ( 3 - 4 ) 

eEOut-g(v) eEOuts{v ) eEOutn (v) 

and 

E /( e ) + E /( e ) = E /( e )- ( 3 - 5 ) 

eElns(v) eEln-(v) eElnn(v ) 

Since v £ V — {s, £}, 



E - E /( e ) = °- ( 3 - 6 ) 

eEOutn(v) eElnniv) 

Inserting (3.4) and (3.5) into (3.2) — (3.3) we obtain 

£ m- £ m = 

eG J E(S'U{v},5-{v}) eG^S-M^uM) 

E_/( e )- E /( g )+ E /( e )- E /( e ) ( 3 = 6) 

eEE(S,S ) eG£(S,S) eGOutfj(^) eEln H (v) 

£ /W- £ /(*),= *>• 

eEE(S,S ) eeE(S,S) 

□ 

Now we define the minimum network cut problem 8 which is strongly re- 
lated to the maximum flow problem and will be helpful in the process of 
designing a pseudo-polynomial-time algorithm for both problems. 

Definition 3. 2.3. 5. Let H = ((V, E),c, A, s,t) be a network and let H(S ) be 
a cut of H with respect to an S C V. The capacity of the cut H(S) is 

°( s ) = E c ( e )- 

e£E{S,S ) 

8 Later in Section 3.7 we will call this problem a dual problem to the maximum 
flow problem. 




162 3 Deterministic Approaches 



Thus, the capacity of a cut is the sum of the capacities of all edges going 
from S to S. Considering the network in Fig. 3.1, if S contains the 4 leftmost 
vertices, then c(S) = 8 + 15 + 7 = 30. For S = {s}, c(S) = 10 + 17 = 27. 

The minimum network cut problem is to find, for a given network i7, 
a cut of H with the minimal capacity. 

Exercise 3. 2. 3. 6. Find a maximal flow and a minimal cut of the network in 
Fig. 3.1. 

The following results show that both optimization problems for networks are 
strongly related to each other. 

Lemma 3. 2. 3. 7. Let H = ((V, E),c, A, s,t), A C IR + , be a network. For 
every flow function f of H and every cut H(S ) of H, 

F f < c(S). 

Proof. Following Lemma 3. 2. 3. 4 and the relation /(e) < c(e) for every eGE, 
we directly obtain for every cut H(S) 

Ff= - Y2 - Y2 - Y2 c ( e ) = c ( 5 )- 

eeE(S,S ) eeE(S,S ) eeE(S,S) e£E(S,S ) 



□ 

Lemma 3. 2. 3. 8. Let H = ((V, E), c, A, 5, t), A C ]R + , be a network. Let f be 
a flow function of H and let H(S ) be a cut of H. If 

F f = c(S ), 



then 

(i) Ff is a maximal flow of H , and 

(ii) H(S) is a minimal cut of H. 

Proof. This assertion is a direct consequence of Lemma 3. 2. 3. 7. □ 

To solve the maximum flow problem we present the method of Ford and 
Fulkerson. The idea is very simple. One starts with an initial flow (for instance, 
/(e) = 0 for all e G E) and tries to improve it step by step. Searching for an 
improvement of an optimal flow the Ford-Fulkerson algorithm finds a minimal 
cut of i/, and so this algorithm recognizes that the current flow is already 
an optimal one. The kernel of this algorithm is the way of searching for an 
improvement possibility. It is based on so called augmenting paths. 

A pseudo-path of a network H (Fig. 3.4) is a sequence 

vo,eo,v\, ei, . . .,e k ,v k +i with v 0 ,vi, . . . , Vfc+i eV,s = v 0 ,t = v k +i 




3.2 Pseudo-Polynomial-Time Algorithms 163 




and eo, ei , . . . , e*; G E , such that it starts in s, finishes in t and does not contain 
any vertex twice (|{uo,ui, . . . ,Vk+i}\ = k + 2), and either e* = (u*,Ui + i) or 
e* = {v i+ i ,Vi). 

The main difference between a pseudo-path and a directed path is that the 
edges of a pseudo-path do not need to have the same direction. For instance, 
s, ui, V 2 , ^3, vq, t and s, V 2 , vi, U4, t determine pseudo-paths of the network 
in Fig. 3.1 which are no paths. 

An augmenting path with respect to a network H and a flow function 
/ is a pseudo-path Vo, eo, vi, ei, . . . , Vk, e*,, Vk+i such that 

(i) for every edge e* = (vi,Vi+i) directed from s to t, f(yi,Vi+ 1) < c(ui,u*+i) 

(ii) for every edge ej = (vj+i,Vj) directed from t to s, f(vj+i,Vj) > 0. 

If ei = (vi, Vi+i) is an edge directed from s to t, then the residual capac- 
ity of ei is 

res(ei) = c(vi,v i+1 ) - f(vi,v i+1 ). 

If ej = (vj+i,Vj) is an edge directed from t to s, then the residual capacity 
of ej is 

res(ej) = f(v j+1 ,Vj). 

The residual capacity of the augmenting path P — uo, eo, Ui, ei, . . . , 

? efc , is 

res(P) — min {res(ei) \ i = 0, . . . , k} 

If one finds an augmenting path P of H with a flow function /, then there is 
a possibility to increase Ff by the value res(P). 

Lemma 3. 2. 3. 9. Let H = ((V, E), c, IR + , s, t) be a network and let f be a 
flow function of H. Let P be an augmenting path with respect to H and f . 
Then the function f f :E —> IR + defined by 

/'(e) = /(e) if e is not in P, 

/'(e) = /(e) + res(P) if e is in P and e is directed from s to t, and 
/'(e) = /(e) — res(P) if e is in P and e is directed from t to s. 

is a flow function with Ff> = Ff -f res(P). 

Proof. Following the definition of /' and of res(e ) for every edge of P, it is 
obvious that 



0 < /'(e) < c(e) 




164 3 Deterministic Approaches 

for every e G E. Now, we have to check whether 

£ /'(e)- £ /'(e) = 0 

eElnn{v) e£Outn (v) 

for every v G V. If v is not in P, then this is obviously true. If v is in P, then 
we consider the four possibilities as depicted in Fig. 3.5. 

In the case (a) the augmenting path is P = s, . . . , e, v, h, . . . , t, where 
both e and h are directed from s to t. Since /'(e) = /(e) + res(P ), the flow 
incoming to v increases by res(P). Since f'(h) = f(h) + res(P), the flow 
leaving v increases by res(P), too. The flows of other edges incident to v did 
not change and so the incoming flow of v and the outcoming flow of v are 
balanced again. 

In the case (b) both the ingoing and the outgoing edge of v are directed 
from t to s, thus the incoming flow to v and the outcoming flow from v are 
decreased by the same value res(P) and so they remain balanced. 

In the case (c) the edge e is directed from s to t and the edge h is directed 
from t to s. Thus, the outcoming flow does not change. But the incoming flow 
does not change as well because e, h e Itih(v) and the flow of e increases by 
the same value res(P) as the flow of h decreases. 

In the case (d) we have e, h G Outn{v). Similarly as in the case (c) both 
the incoming and the outcoming flow of v do not change. 

Thus, /' is a flow function of H. The known facts are that 

Ff= £ /'(e)- £ f\h) 

eeOutn(s) helnn(s) 

and that the first edge g of P has to be from OutH{s)UlriH(s). If g G Outn(s ), 
then f'(g) = f(g) + res(P) and so Ff> = F f -f res(P). If g e In H (s), then 
fid) = fid ) - res(P) and again F f > = Ff + res(P). □ 

The only remaining question is whether one can efficiently find an aug- 
menting path if such a path exists. The following algorithm shows that this 
is possible. 



/(e) + res(P\ f{h) + res(P\ 
h 



v 

(a) 



/(e) + res(P) f(h) - res(P) 
s H >4 t t 

e v h 

(c) 



„ m-res(P) ^ f{h)-res{P ) ± 
s r t 

e v h 

(a) 



_ /(e) - res(P) ^ f(h) + res(P) ^ 
s ^ - U r ^ t 

e v h 

(d) 



Fig. 3.5. 




3.2 Pseudo- Polynomial-Time Algorithms 165 



Algorithm 3.2.3.10 (The Ford- Fulkerson Algorithm). 

Input: (V, E), c, s, t of a network H = ((V, E), c, Q + , s, t ). 

Step 1: Determine an initial flow function / of H (for instance, /(e) = 0 for 

all e G E)\HALT:= 0 
Step 2: S := {s}; S :—V — S\ 

Step 3: while t 0 S and HALT= 0 do 

begin find an edge e = (iq v) G P(5, 5) U E(S, S ) such that 
res(e) > 0 

-c(e) — /(e) > 0 if e G 5) and /(e) > 0 if 
eG^(S, 5)"; 

if such an edge does not exist then HALT:= 1 
else if e G E(S, S ) then S := S U {u} 
else S := S U {u}; 

end 

Step 4: if HALT— 1 then return (/, 5) 

else begin find an augmenting path P from s to t, which 
consists of vertices of S only; -this is possible 
because both s and t are in S'"; 
compute res(P ); 

determine /' from / as described in Lemma 3. 2. 3. 9 

end; 

goto Step 2 

Fig. 3.6 illustrates the work of the Ford-Fulkerson algorithm that starts 
with the flow function /(e) = 0 for every e G E (Fig. 3.6(a)). The first aug- 
menting path Pi computed by the algorithm is determined by the sequence 
of edges (s, c), (c, d), (d, t). Since res(P\) = 4, we obtain Ff x = 4 for the 
flow function fi depicted in Fig. 3.6(b). The next augmenting path is P 2 
(Fig. 3.6(b)) defined by the sequence of edges ( 5 , a), (a, 5), (6, c), (c, d), (d, £). 
Since res(P 2 ) = 3 we obtain the flow function with Ff 2 = 7 in Fig. 
3.6(c). Now, one can find the augmenting path P 3 (Fig. 3.6(c)) determined 
by (s, a), (a, 6), (6, £). Since res(Ps) = 7 we obtain the the flow function /3 
(3.6(d)) with P/ 3 = 14. The flow function is an optimal solution because 
reaching S = {s, a, b} there is no possibility to extend S and so H(S) is the 
minimal cut of H. 

Theorem 3.2.3.11. The Ford-Fulkerson algorithm solves the maximum flow 
problem and the minimum network cut problem and it is a pseudo-polynomial- 
time algorithm for the input instances with capacity functions from E to IN. 

Proof The Ford-Fulkerson algorithm halts with a flow function / and a set 
S such that: 

(i) for all e G E(S, S), /(e) = c(e), and 

(ii) for all h G E(S, S), f(h) = 0. 




3.2 Pseudo-Polynomial-Time Algorithms 167 



This implies 

Ff= E_/( e )- E /w= E_ c ( e ) = c ( 5 ) 

e£H(S,S) heE(S,S ) eeH(S,S ) 

Following Lemma 3. 2. 3. 8, / is a maximal flow function and 5 induces a min- 
imal cut of H. 

Now, we analyze the complexity of the Ford-Fulkerson algorithm. Step 1 
can be executed in time 0(\E\ ) and Step 2 in time 0(1). One execution of 
Step 3 costs at most 0(\E\ ) because we look at most once at any edge. Step 
4 can be performed in 0(|P|) time if one searches for an augmenting path 
by starting from t and trying to reach s. The number of iterations of the 
algorithm is at most Ff = c(S). But 

c(S) < c(e) < \E\ • max{c(e) | e G E} = \E\ • Max-Int(H). 

eeE 

Thus, the algorithm runs in 0(\E\ 2 • Max-Int(H)) time. □ 

A direct consequence of Theorem 3.2.3.11 is the following assertion, which 
is a stacking point for the important concept of linear-programming duality 
(Section 3.7.4). 

Theorem 3.2.3.12 (Max-Flow Min-Cut Theorem). For every instance 
I = (G, c, 1R + , s, t) of the maximium flow problem (Max-FF) and of the 
minimum network cut problem (Min-NCP^, 

Gp^MAx-FP (-0 = Gp^MiN-NCP (-0* 



3.2.4 Limits of Applicability 

In this section we are asking for the classification of integer-valued problems 
according to the existence of pseudo-polynomial-time algorithms for them. 
We show that by applying the concept of NP-hardness one can easily derive a 
technique for proving the nonexistence of pseudo-polynomial-time algorithms 
for some integer- valued problems, if P ^ NP. 

Definition 3.2.4. 1 . An integer-valued problem U is called strongly NP- 
hard if there exists a polynomial p such that the problem Value(p)-U is NP- 
hard. 

The following assertion shows that the strong NP-hardness is exactly the 
term we are searching for. 

Theorem 3. 2.4. 2 . Let P ^ NP, and let U be a strongly NP -hard integer- 
valued problem. Then there does not exist any pseudo-polynomial- time algo- 
rithm solving U . 




168 3 Deterministic Approaches 



Proof. Since U is strongly NP-hard, there exists a polynomial p such that 
Value{p)-U is NP-hard. Following Theorem 3. 2. 1.3, the existence of a pseudo- 
polynomial-time algorithm for U implies the existence of a polynomial-time 
algorithm for Value(h)-U for every polynomial h (and so for Value(p)-U , too). 
But the existence of a polynomial-time algorithm for the NP-hard problem 
Value{p)-U would immediately imply P = NP. □ 

Thus, to prove the nonexistence of any pseudo-polynomial-time algorithm 
for an integer- valued problem [7, it is sufficient to show that Value(h)-U is 
NP-hard for a polynomial h. We illustrate this approach by showing that 
TSP is strongly NP-hard. Clearly, TSP can be considered an integer- valued 
problem because any input instance can be viewed as the collection of integer 
values corresponding to the costs of the edges of a complete graph. 

Lemma 3. 2. 4.3. TSP is strongly NP -hard. 

Proof. Since HC is NP-hard, it is sufficient to prove HC < p Lang v a i ue (p)- tsp 
for the polynomial p(n) — n. 

Let G be an input of the Hamiltonian problem. The task is to decide 
whether G contains a Hamiltonian tour. Let G — (V, E), \V\ = n for a positive 
integer n. 

We construct a weighted complete graph (K n ,c), where K n = (V,E com ), 
Ecom = {{tq | u, v G V, u 7 ^ u}, and c : E com — ► {1, 2} is defined by 

c(e) = 1 if e G E, and 
c(e) — 2 if e £ E. 

We observe that G contains a Hamiltonian tour iff OptTSp(K n , c) = n, i.e., 
iff ((AT n ,c),n) G Lang Value ( p y T SP . Thus, solving TSP for the input instance 
(K ni c) one decides the Hamiltonian cycle problem for the input G. □ 

In the proof of Lemma 3. 2. 4. 3, we showed that TSP is NP-hard even if 
one restricts the costs of the edges to two values 1 and 2. Since such input 
instances satisfy the triangle inequality, we obtain that A-TSP is strongly 
NP-hard, too. 

We observe that the weighted vertex cover problem ( Weight- VCP) is 
strongly NP-hard because the unweighted version Min- VCP is NP-hard. In 
general, every weighted version of an optimization graph problem is strongly 
NP-hard if the original “unweighted” version is NP-hard. 



Keywords introduced in Section 3.2 

integer- valued problem, pseudo-polynomial-time algorithm, p-value-bounded sub- 
problem, strongly NP-hard problem, Ford- Fulkerson Algorithm 




3.3 Parameterized Complexity 169 



Summary of Section 3.2 

An integer- valued problem is any problem whose input can be viewed as a collection 
of integers. A pseudo-polynomial-time algorithm for an integer- valued problem is 
an algorithm whose running time is polynomial in the input size and in the values 
of the input integers. So, a pseudo-polynomial-time algorithm runs in polynomial 
time for the input instances with input values that are polynomial in the size of the 
whole input. 

The dynamic programming method can be used to design a pseudo-polynomial- 
time algorithm for the knapsack problem. The idea is to subsequently compute a 
minimal weight solution for every achievable profit in a bottom-up manner. Another 
nice example of a pseudo-polynomial-time algorithm is the Ford- Fulkerson algorithm 
for the maximum flow in networks which is based on the concept of duality. 

An integer-valued problem is strongly NP-hard if it is also NP-hard for the 
input instances with small integer values. Thus, strongly NP-hard integer- valued 
problems are problems that do not admit pseudo-polynomial-time algorithms. TSP 
is a representative of strongly NP-hard problems, because it is NP-hard even for 
input instances with the values 1 and 2 only. 



3.3 Parameterized Complexity 

3.3.1 Basic Concept 

The concept of parameterized complexity is similar to the concept of pseudo- 
polynomial-time algorithms in the sense that both these concepts are based 
on the first approach of Section 3.1. One tries to analyze a given hard problem 
more precisely than by taking care of the worst case complexity only. In the 
concept of parameterized complexity the effort is focused on the search for a 
parameter that partitions the set of all input instances into possibly infinite 
many subsets. The idea is to design an algorithm that is polynomial in the 
input length but possibly not in the value of the chosen parameter. For in- 
stance, an algorithm can have the time complexity 2 k • n 2 , where n is the 
input size and k is the value of a parameter of the given input. Thus, for small 
k this algorithm can be considered to be efficient, but for k = yfn it is an 
n 2 • 2 n -exponential algorithm. What is important with this concept is that our 
efforts result in a partition of the set of all input instances into a spectrum 
of subclasses according to the hardness of particular input instances. In this 
way one obtains a new insight into the problem by specifying which input 
instances make the problem hard and for which input instances one can solve 
the problem efficiently. This moves the thinking about tractability from the 
classification of problems according to their hardness (as considered in Section 
2.3) to the classification of input instances of a particular problem according 
to their computational difficulty. This is often the best way to attack hard 
problems in concrete applications. 

In what follows we formalize the above described concept. 




170 3 Deterministic Approaches 



Definition 3. 3. 1.1. Let U be a computing problem , and let L be the language 
of all instances ofU. A parameterization of U is any function Par : L — » IN 
such that 

(i) Par is polynomial-time computable , and 
(ii) for infinitely many fc G IN, the fc-fixed-parameter set 

Setu(k) = {x G L | Par(x) = k} 



is an infinite set. 

We say that A is a Par-parameterized polynomial-time algorithm for 

U if 

(i) A solves U , and 

(ii) there exists a polynomial p and a function f : IN — > IN such that, for every 
x e L, 

TimeA(x) < f(Par(x)) -p(\x\). 

If there exists a Par -parameterized polynomial-time algorithm for U, then we 
say that U is fixed-parameter-tractable according to Par. 

First of all, we observe that condition (ii) of the definition of a parame- 
terization of U is not necessary but useful for the concept of parameterized 
complexity. We only consider it to remove dummy parameterizations such as 
Par(x) is equal to the size \x\ of x. In that case if one takes f(n) = 2 n , every 
2 n -exponential algorithm is Par-parameterized polynomial-time algorithm for 
U. This is surely not our aim. Forcing the sets Setu(k) to be infinite means 
that the parameter given by Par does not depend on the size of input in- 
stances in the sense that input instances of an arbitrary large size may have 
the parameter k. For instance, if the input instances are graphs, the parameter 
may be the degree of the graph, or its cutwidth (bandwidth). We surely have 
infinitely many graphs with a fixed degree k > 2. Obviously, for the choice 
of Par , it is not sufficient to follow only the requirements (i) and (ii) formu- 
lated in Definition 3. 3. 1.1. A good choice of Par should capture the inherent 
difficulty of particular input instances, and so looking for Par one has to ask 
what makes the considered problem difficult. 

If one thinks that a chosen parameterization Par corresponds to her/his 
intuition about the hardness of input instances, the next step is to design an 
algorithm of a complexity f{Par{x)) -p(\x\). The purpose is to make p a small- 
degree polynomial and / as a superpolynomial function that grows as slowly 
as possible. Obviously, p primarily determines the efficiency of the algorithm 
for fixed parameters, and / decides for which parameters one can make the 
problem tractable. Thus, one can see the concept of parameterized complexity 
as the search for a set Setu(i) such that the subproblem of U obtained by 
exchanging the set L of all input instances for IJ^Li Setjj(i) becomes tractable. 
The investigation shows that the hardness of many problems is extremely 




3.3 Parameterized Complexity 171 



sensitive with respect to the choice of (the restriction on) the set of input 
instances, and so studies using this approach may be very successful. 

In the following section we illustrate the concept of parameterized com- 
plexity with the design of simple parameterized polynomial-time algorithms 
for some NP-hard problems. Section 3.3.3 contains a short discussion about 
the applicability of this concept. 



3.3.2 Applicability of Parameterized Complexity 

First, we observe that the concept of pseudo-polynomial-time algorithms can 
be viewed as a special case of the concept of parameterized complexity. 9 For 
any input instance x — Xi#£ 2# • • • #£ n , X\ E {0, 1}* for i = 1, . . . , n, of an 
integer- valued problem U, one can define 

Val(x) = max{|xi| | i = 1, . . . , n}. (3.7) 

Obviously, Max-Int(x) < 2 Val ^ and Val is a parameterization of U. If A is a 
pseudo-polynomial-time algorithm for U, then there is a polynomial p of two 
variables, such that 

TimeA{x ) E O (p(\x\, Max-Int(x))) = O (^p ^\x\,2 Val ^^ 

for every input instance x of U. So, for sure, Time^(x) can be bounded by 
2 d-Vai(x) . | x |c f or su itable constants c and d. But this implies that A is a Val- 
parameterized polynomial-time algorithm for U. Thus, we have proved the 
following result. 

Theorem 3.3.2. 1. Let U be an integer- valued problem and let Val be the 
parameterization of U as defined in (3.7). Then, every pseudo-polynomial- 
time algorithm for U is a Val -parameterized polynomial-time algorithm for 

U. 



Note that the opposite (every VaZ-parameterized polynomial-time algo- 
rithm is a pseudo-polynomial-time algorithm) does not hold in general, be- 
cause a VaZ-parameterized algorithm with complexity 2 2V ™ (x) • p(\x\) for a 
polynomial p is not a pseudo-poly nomial-time algorithm. 

Algorithm 3. 2. 2. 2 is a pseudo-polynomial-time algorithm for the knap- 
sack problem. We showed in Theorem 3. 2. 2. 3 that its time complexity is in 
O (n 2 • Max-Int(x)) for every input x. Thus, its complexity is in O (2 k • n 2 ) 
for every input from SetKp(k) = {x e {0, 1, #}* | Val(x) = k }, i.e., Algorithm 

9 Note that the original concept of parameterization did not cover the concept of 
pseudo-polynomial-time algorithms. We have generalized the concept of param- 
eterized complexity here in order to enable exploring the power of the approach 
of classifying the hardness of input instances. A more detailed discussion about 
this generalization is given in Section 3.3.3. 




172 3 Deterministic Approaches 



3. 2. 2. 2 is an efficient VaZ-parameterized polynomial time algorithm for the 
knapsack problem. 

Now we show a simple example of a parameterized polynomial-time al- 
gorithm, where the parameter is simply one item of the input. Consider the 
vertex cover problem where, for an input (G, fc), one has to decide whether G 
possesses a vertex cover of size at most k. We define Par(G , k) = k for all in- 
puts (G, k). Obviously, Par is a parameterization of the vertex cover problem. 
To design a Par-parameterized polynomial-time algorithm for VC we use the 
following two observations. 

Observation 3. 3. 2. 2. For every graph G = (V,E) that possesses a vertex 
cover S C V of cardinality at most k , S must contain all vertices of V with a 
degree greater than k. 

Proof. Let u be a vertex adjacent to a > k edges. To cover these a edges 
without taking u in the cover, one must take all a > k neighbors of u to the 
cover. □ 

Observation 3. 3. 2. 3. Let G have a vertex cover of size at most m and let 
the degree of G be bounded by k. Then G has at most m • (k + 1) vertices. 

The following algorithm is based on the idea of taking (into the vertex 
cover) all vertices that must be in the vertex cover because of their high 
degree and then to execute an exhaustive search for covers on the rest of the 
graph. 

Algorithm 3. 3. 2. 4. Input: (G, k), where G = (V,E) is a graph and k is 

a positive integer. 

Step 1: Let H contain all vertices of G with degree greater than k. 

if \H\ > k, then output ( "reject” ) {Observation 3. 3. 2. 2}; 

if \H\ < k, then m := k — |i2j and G' is the subgraph of G 

obtained 

by removing all vertices of H with their incident edges. 

Step 2: if G' has more than m(k + 1) vertices [\V — H\ > m{k + 1)] then 

output( “reject”) {Observation 3. 3. 2. 3}. 

Step 3: Apply an exhaustive search (by backtracking) for a vertex cover of 

size at most m in G'. 

if G' has a vertex cover of size at most m, then output ( "accept” ), 
else out put (“reject”). 

Theorem 3. 3. 2. 5. Algorithm 3. 3. 2.4 is a Par -parameterized polynomial-time 
algorithm for VC. 

Proof. First, observe that Algorithm 3. 3. 2. 4 solves the vertex cover problem. 
It rejects the input in Step 1 if \H\ > fc, and this is correct because of Obser- 
vation 3. 3. 2. 2. If \H\ < fc, then it takes all vertices of H into the vertex cover 
because they must be in any vertex cover of size at most k (again because of 




3.3 Parameterized Complexity 173 



Observation 3. 3. 2. 2). So, the question whether G possesses a vertex cover of 
size k was reduced to the equivalent question whether the rest of the graph G r 
has a vertex cover of size at most m — k — \H\. Following Observation 3. 3. 2. 3 
and the fact that the degree of G f is at most k , Algorithm 3. 3. 2. 4 rejects the 
input in Step 2 if G f has too many vertices (more than ra(fc + l))to have a 
vertex cover of size m. Finally in Step 3, Algorithm 3. 3. 2. 4 establishes by an 
exhaustive search whether G f has a vertex cover of size m or not. 

Now we analyze the time complexity of this algorithm. Step 1 can be 
implemented in time O(n), and Step 2 takes 0(1) time. An exhaustive search 
for a vertex cover of size m in a graph of at most m(k- hi) < k • (fc + 1) vertices 
can be performed in time 

o ((* • m • (t + 1)) + 1> )) S ° ( fc3 ' C ' ( V " 'O) S ' 0 («=“) 

because there are at most ( m ’^ +1 ^) different subsets of cardinality m of the 
set of the vertices of O', and O' has at most k • m • (k + 1) edges. Thus, the 
time complexity of Algorithm 3. 3. 2. 4 is in O (n + k 2k ) which is included in 
O (k 2k • n) . □ 

Since k 2k can be large already for small parameters /c, we present another 
Par- parameterized polynomial-time algorithm for VC. It is based on the fol- 
lowing simple fact. 

Observation 3. 3. 2. 6. Let G be a graph. For every edge e = {u, v}, any 
vertex cover of G contains at least one of the vertices u and v. 

We consider the following divide-and-conquer strategy. Let (G, k) be an 
input instance of the vertex cover problem. Take an arbitrary edge {rq,^} 
of G. Let Gi be the subgraph of G obtained by removing Vi with all incident 
edges from G for i = 1, 2. Observe that 

(G, k) e VC iff [(Gi, k — 1) e VC or (G 2 , k- 1) e VC]. 

Obviously, (Gi,/c — 1) can be constructed from G in time 0(|V|). Since, for 
every graph iif, (i/, 1) is a trivial problem that can be decided in 0(|V|) time 
and the recursive reduction of (G, k) to subinstances of (G, k ) can result in 
solving at most 2 k subinstances of (G, &), the complexity of this divide-and- 
conquer algorithm is in O (2 k • n) . Thus, this algorithm is a Par- parameterized 
polynomial-time algorithm for VC, and it is surely practical for small values 
of k. 

Exercise 3. 3. 2. 7. Combine Algorithm 3. 3. 2. 4 and the above divide-and- 
conquer algorithm to design a faster algorithm for VC than the presented 
ones. □ 

Exercise 3. 3. 2. 8. Let, for every Boolean function ^ in CNF, Var($) be the 
number of variables occurring in Prove that Max- Sat is fixed-parameter- 
tractable according to Var. □ 




174 3 Deterministic Approaches 



Exercise 3. 3. 2. 9. Let ((A, F), k), T C Pot(X), be an instance of the deci- 
sion problem Lang sc . 10 Let, for every x £ A, numjr{x ) be the number of sets 
in T that contain x. Define 

Pat((X, T), k) = max{fc, ma x{num^(x) \ x G X}} 

that is a parameterization of Lang sc . Find a Pa£-parameterized polynomial- 
time algorithm for Lang sc . □ 

3.3.3 Discussion 

Here, we briefly discuss the applicability of the concept of parameterized com- 
plexity. First of all, observe that the fixed-parameter tract ability according to 
a parameterization does not necessarily need to imply the practical solvability 
(tractability) of the problem. For instance, this approach does not work when 

the complexity is similar to 2 2 • n c for some constant c. Such a parameterized 

polynomial-time algorithm is far from being practical even for small /c’s, and 
one has to search for a better one. Another possibility, when this approach 
fails in an application, is if one chooses a parameter that usually remains large 
relative to the input size for most of the problem instances to be handled. Such 
an example is presented in Exercise 3. 3. 2. 8, where the parameter Var(<P) is the 
number of Boolean variables in Usually, Var(L>) is related to the number 
of literals in <£. Thus, Max- Sat remains a very hard optimization problem 
despite the fact that it is fixed-parameter tractable according to Var. The art 
of the choice of the parameterization is in finding such a parameterization F 
that 

(i) one can design a practical F-parameterized polynomial-time algorithm, 
and 

(ii) most of the problem instances occurring in the considered application have 
this parameter reasonably small. 

If one is interested in proving negative results one can do it in a way 
similar to that for pseudo-polynomial-time algorithms. If one proves that the 
subproblem of U given by the set of input instances Setu(k) is NP-hard for 
a fixed constant parameter fc, then it is clear that U is not fixed-parameter 
tractable according to the considered parameterization. 

To see an example, consider the TSP problem and the parameteriza- 
tion Par(G,c) — Max-Int(G,c). The fact that TSP is strongly NP-hard, 
claimed in Lemma 3. 2. 4. 3, implies that TSP remains NP-hard even for small 
Max-Int(G,c). In fact, we have proved that TSP is NP-hard for problem in- 
stances with c : E — > {1, 2}. Thus, there is no Par-parameterized polynomial- 
time algorithm for TSP. 

10 Remember that Lang sc is the threshold language of the set cover problem. 
((A, P),k) belongs to Lang sc if there exists a set cover of (A, F) of cardinal- 
ity at most k. 




3.4 Branch-and-Bound 175 



Another example is the set cover problem and the parameterization Card 
defined by Card{X^T) = ma Jt{numf(x) | x G X} for every input instance 11 
(X,P) of SC. Since SC restricted to the set of input instances Setsc(2) = 
{(X^F) | Card(X^J r ) = 2} is exactly the minimum vertex cover problem and 
Min- VC is NP-hard, there is no Card - parameterized polynomial-time algo- 
rithm for SC. 



Keywords introduced in Section 3.3 

parameterization of a problem, parameterized polynomial-time algorithms, fixed- 
parameter tractability 

Summary of Section 3.3 

The concept of parameterized complexity is a generalization of the concept of 
pseudo-polynomial algorithms. One tries to partition the set of all input instances 
of a hard problem according to a parameter determined by the input instances in 
such a way that it is possible to design an algorithm that works in polynomial time 
according to the input size but not according to the parameter. Then, such an al- 
gorithm may provide an efficient solution to input instances whose parameters are 
reasonably small. This can contribute to the search for the border between the easy 
input instances and the hard input instances of the given problem. 

The main difficulty in applying this approach is to find a parameter that realis- 
tically captures the hardness of the problem instances and simultaneously enables 
the design of an algorithm that is efficient for instances with restricted parameter 
sizes. 



3.4 Branch-and-Bound 
3.4.1 Basic Concept 

Branch-and-bound is a method for the design of algorithms for optimization 
problems. We use it if we unconditionally want to find an optimal solution 
whatever the amount of work (if executable) should be. Branch-and-bound 
is based on backtracking, which is an exhaustive searching technique in the 
space of all feasible solutions. The main problem is that the cardinality of 
the sets of feasible solutions are typically as large as 2 n , n!, or even n n for 
inputs of size n. The rough idea of the branch-and-bound technique is to 
speed up backtracking by omitting the search in some parts of the space of 
feasible solutions, because one is already able to recognize that these parts do 
not contain any optimal solution in the moment when the exhaustive search 
would start to search in these parts (to generate the solutions of these parts) . 

11 See Exercise 3. 3. 2. 9 for the definition of numjr(x). 




176 3 Deterministic Approaches 



Remember that backtracking (as described in Section 2.3.4) can be viewed 
as the depth-first-search or another search strategy in the labeled rooted tree 
Tm(x), whose leaves are labeled by feasible solutions from M(x) and every 
internal vertex of Tj is labeled by the set S v C M(x) that contains all 
feasible solutions that are labels of the leaves of the subtree T v rooted by v. 
Branch- and-bound is nothing else than cutting T v from Tj ^( x ) if the algorithm 
is able to determine at the moment when v is visited (generated) that T v does 
not contain any optimal solutions. The efficiency of this approach depends on 
the amount and the sizes of the subtrees of Tm(x) that may be cut during the 
execution of the algorithm. 

The simplest version of the branch- and-bound technique has already been 
mentioned in Section 2.3.4. Being in a vertex v, one compares the cost of the 
best solution found up till now with bounds on the minimal or maximal costs 
of feasible solutions in S v of T v . This comparison can be done because the 
specification of S v usually enables one to efficiently estimate a range of the 
costs of feasible solutions in S v . If the cost of the preliminary best solution 
is definitely better than any cost in the estimated range, one cuts T v (i.e., 
one omits the search in T v ). The efficiency of this naive branch- and-bound 
approach may significantly vary depending on the input instance as well as 
on the way in which the tree Tj ^( x ) has been created. 12 This simple approach 
alone is usually not very successful in solving hard optimization problems in 
practice. 

The standard version of branch-and-bound is based on some precompu- 
tation of a bound on the cost of an optimal solution. More precisely, one 
precomputes a lower bound on the optimal cost for maximization problems 
and an upper bound on the optimal cost for minimization problems. The 
standard techniques for computing such bounds are 13 

(i) approximation algorithms (presented in Chapter 4), 

(ii) relaxation by reduction to linear programming (presented in Section 3.7), 

(iii) random sampling (presented in Chapter 5), 

(iv) local search (presented in Section 3.6), and 

(v) heuristic methods such as simulated annealing and genetic algorithms 
(presented in Chapter 6). 

The precomputed bound is then used to cut all subtrees of T^ x ) whose so- 
lutions are not good enough to reach the cost of this bound. If one succeeded in 
computing a good bound, then the time complexity of the branch-and-bound 
procedure may substantially decrease and this approach becomes practical for 
the problem instance considered. 

12 That is, which specification has been used to branch in M{x). 

13 Note that this is not an exhaustive list and that one can use a combination of 
several such techniques in order to get a reasonable bound on the cost of an 
optimal solution. 




3.4 Branch-and-Bound 177 



The rest of this section is organized as follows. The next section shows 
two examples of applications of the branch- and- bound method for Max- Sat 
and TSP. Section 3.4.3 is devoted to a discussion about the advantages and 
drawbacks of the branch- and-bound method. 

3.4.2 Applications for MAX-SAT and TSP 

In this section we illustrate the work of the branch-and-bound method on some 
problem instances of Max- Sat and TSP. First, we consider the straightfor- 
ward version of the branch-and-bound technique without any precomputing 
for Max- Sat. So, one starts backtracking without any bound on the optimal 
cost, and after finding a feasible solution one can cut a search subtree if it 
does not contain any better feasible solutions than the best solution found up 
till now. Backtracking is very simple for Max- Sat because every assignment 
of Boolean values to the variables of the input formula is a feasible solution 
to <P. Thus, in every inner vertex of the search tree one branches according to 
two possibilities, 




Xi — 1 and Xi = 0, for some variable Xi. Figure 3.7 shows the complete search 
tree T M ^ for the formula 

^{x \ , x 2 , x 3 , X/{) — (xi Vx 2 ) A (xi V x 3 V X4) A (xi V x 2 ) 




178 3 Deterministic Approaches 



A(x\ V £3 V £4) A (x2 V £3 V £4) A (£1 V £3 V £4) 

AX3 A (#i V X4) A (ofi V £3) A x \ . 

Observe that $ consists of 10 clauses and that it is not satisfiable. Any 
of the assignments 1111, 1110, 1101, 1100 satisfies 9 clauses, and so all these 
assignments are optimal solutions for 




Fig. 3.8. 



Figure 3.8 shows the part of the tree that corresponds to the branch- 

and-bound implementation by the depth-first-search where, for every inner 
vertex v, the left son of v is always visited before the right son of v is visited. 
The first feasible solution is 1111 (x\ = 1,£2 = 1,X3 = 1,£4 = 1) with cost 
9 . Since the number of all clauses of $ is 10, we know that if, for a partial 
assignment a, at least one clause is not satisfiable for every extension of a to 
an assignment, then we do not need to look at such assignments. So, being in 
the vertex corresponding to the partial assignment x\ = 1, £2 = 1, #3 = 1, 
we see that the clause x\ V xs cannot be satisfied by any choice of the free 
variable £4, and we cut the right subtree of this vertex. Similarly: 

• for the partial assignment X\ — 1, X2 = 1, £3 = 0, the clause £3 cannot be 
satisfied, 

• for the partial assignment x\ — 1, £2 = 0, the clause x\ V £2 cannot be 
satisfied. 




3.4 Branch-and-Bound 179 




• for the partial assignment x\ — 0, the clause x\ cannot be satisfied. 

Thus, the branch-and-bound performed by the described depth-first-search 
has reduced the backtrack method to the visitation (generation) of 8 vertices 
from the 31 vertices of Tm(<p)- 

In Figure 3.9 we see that we visit (generate) 18 vertices of T M (#) by the 
depth-first-search when one first visits the right son (follows the choice x = 0 
at first) for every inner vertex of T M ^y This search first finds the feasible 
solution 0000 with cost 7, then it improves to 0011 with cost 8, and finally it 
finishes with 1100 with optimal cost 9. For instance, the subtree of the inner 
vertex corresponding to the partial assignment X\ = 0, x<i — 1 is not visited 
because 

(i) the clauses X\ V x 2 and x\ cannot be satisfied by any extension of this 
partial assignment, and 

(ii) the best feasible solutions found up till now satisfies 8 from 10 clauses of 
<£. 

Comparing Figures 3.8 and 3.9 we see that the efficiency of the branch-and- 
bound depends on the kind of the search in T M ^ . One can easily observe that 
using the breadth-first-search one obtains a search subtree of Tm($) that dif- 
fers from the subtrees of Figures 3.8 and 3.9. Using another order of variables 
in building the backtrack tree for leads again to a different complexity of 




180 3 Deterministic Approaches 



the branch-and-bound method. So, the main observation is that the efficiency 
of the branch-and-bound may essentially depend on 

• the search strategy in the tree T M ^, 

• the kind of building ofTj by backtracking. 

Moreover, distinct problem instances may require different search and back- 
track strategies to be solved efficiently. So, any implementation of the branch- 
and-bound method for a hard problem may be suitable for some input in- 
stances, but not that good for other ones. 

Exercise 3.4.2. 1. Perform branch-and-bound of T M ^ in Figure 3.7 by 
breadth-first-search and compare its time complexity (the number of gen- 
erated vertices) with the depth-first-search strategies depicted in Figures 3.8 
and 3.9. □ 

Exercise 3. 4. 2 . 2 . Take the ordering X 4 ,xs,x\,X 2 of the input variables of 
the formula # 2 , ^ 3 , # 4 ) and build the backtrack tree Tm(&) according to 
this variable ordering. Use this T M (&) as the base for the branch-and-bound 
method. Considering different search strategies, compare the number of visited 
vertices with the branch-and-bound implementations presented in Figures 3.8 
and 3.9. □ 

Now we illustrate the work of the branch-and-bound method with a pre- 
computation on TSP. To decide whether to cut a subtree of T M ^ in a vertex 
v we use a straightforward strategy. Consider backtracking for TSP as pre- 
sented in Section 2.3.4. We calculate the lower bound on the costs of feasible 
solutions in T v by the sum of the costs of all edges taken into S v plus the 
number of missing edges to build a Hamiltonian tour times the minimal cost 
over all edges. 

Figure 3.11 shows the application of this branch-and-bound strategy for the 
instance I of TSP depicted in Figure 3.10, starting with a precomputed upper 
bound 10 on the optimal cost. As introduced in Section 2.3.4 Si (hi , . . . , h r , 
ei, . . . , e s ) denote the subset of M.(I) that contains all Hamiltonian tours that 
contain all edges hi , . . . , h r and do not contain any of the edges ei, . . . , e s . In 
generating the corresponding subtree Tj ^(/) we frequently use the following 
two observations. 

Observation 3. 4. 2. 3. If a set S of feasible solutions for the problem instance 
I from Figure 3.10 does not contain any solution containing two given edges 
incident to the same vertex v, then every solution in S must contain the other 
two edges incident to v. 

Proof The degree of K 5 is 4 and every Hamiltonian tour contains two edges 
incident to v for every vertex v of K$. □ 

Observation 3. 4. 2. 4. For all three different edges, e, h , Z, determining a 
path of length 3 in \S(e,h,l)\ < 1 (i.e., every Hamiltonian tour can be 
unambiguously determined by any subpath of length 3). 




3.4 Branch-and-Bound 181 



3 




The first branch in Figure 3.11 is done according to the membership of 
ei2 in the solution. S(ei2) is branched according to the membership of e23 
in the solution, and 5 (ei 2 ,e 23 ) is branched according to the membership of 
634 in the solution. |*5(ei2, e23, e 34 ) | = 1 because V\, v 2 , u 3 , u 4 , u 5 , V\ is the only 
Hamiltonian tour with the subpath ui, v 2 , v 3 , u 4 . So, our first feasible solution 
has cost 12 . The next feasible solution ui, v 2 , v 3 , u 4 , v\ is the only element 

of S'(ei2, e 2 3, £34) and its cost is 19 . The first use of our upper bound 10 is 
presented when reaching the set S'(ei2, e2 4 , 623). c(ei2) + c(e2 4 ) = 1 + 7 = 8 
and so any Hamiltonian tour consisting of these two edges must have cost at 
least 11. Since S(e 12,^23,624) = *5(ei2,625) the next branch is done accord- 
ing to 635. Since 6(635) = 8, no Hamiltonian tour containing 635 can have a 
cost smaller than 12, and we do not need to work with 5(ei2, 623,^24, 635) 
in what follows. The set *S(ei2, 623+24, 635) = 5(ei2, e25, e 4 s) contains the 
only solution ui, v 2 , U5, v 4 , U3, v\ of the cost 12. Let us now have a short 
look at the leaves of the right subtree T S (e 12 ) °f T M (i)- We can again 
bound the cost of the solutions in *5(ei2, ei3, ^35) because c(ei3) + 0(035) = 
3 + 8 = 11 and so any solution in S'(ei2, ei3, 635) has a cost of at least 14 . 
5 (e i2,ei3,e 3 5,e23,e 2 4) = S(e 13 , e 23 , e 24 ) = {(vi, ^5, u*, ^1)} and its cost 

is 20 . S(ei 2 ,ei 3 ,e 3 5 ,e 23 ,e 24 ) = S(ei 3 , e 23 , e 25 ) = {(^1,^3, v 2 , v 5 , v 4 , i>i)} with 
cost 14 . Following Observation 3 . 4 . 2. 3 S(e 12 , e 13 , e 35 , e 23 ) = S(e 13,634,612) = 
*5(^13,^34,615) = {(vi,v 3 ,V4,V2,vs J vi)} and its cost is 14 . Applying Obser- 
vation 3 . 4 . 2 . 3 we obtain 5 (£12, £13) = S(e 14+15), and we branch this set 
according to e2 4 . The only solution of S(ei 2 , ei 3 , e 24 ) = * 5 (ei 4 , ei5, e2 4 ) is 
u 4 , v 2 , U3, U5, v\ and its cost is 21. On the other hand 5(ei2, £13, 624) = 
5(ei4, eis, e 24 ) = S^eu, ei 5 , e 34 ) = {(vi,v 4 ,v 3 , v 2 , v 5 , vi)} and this is the 
optimal solution whose cost is 8. 




3.4 Branch- and- Bound 183 



We observe that the optimal solution v±, ^ 3 , ^ 5 , v\ corresponds to 

the last leaf visited in the depth-first-search of and so without having 

the precomputed upper bound 10 one would generate the whole tree Tj ^(/) 
by backtracking. Obviously, another ordering of edges taken for the branch 
procedure or another search strategy in T M ^ could result in different numbers 
of visited vertices (different time complexities). 

3.4.3 Discussion 

Assuming P ^ NP, it is clear that the branch- and-bound method with any 
clever polynomial-time algorithmic precomputation of bounds on the cost of 
the optimal solutions cannot provide any polynomial-time branch-and-bound 
algorithm for a given NP-hard problem. The only profit that one can expect 
from this method is that it works within reasonable time for several of the 
problem instances appearing in the concrete applications. 

Also, if one can precompute very good bounds by some approximation 
algorithm, there may exist input instances that have exponentially many fea- 
sible solutions with their costs between the optimal cost and the precomputed 
bound. Obviously, this may lead to a high exponential time complexity of the 
branch-and-bound method. 

Since there is no general strategy for searching in Tjw( x ) that would be the 
best strategy for every input instance of a hard problem, one has to concen- 
trate on the following two parts of the branch-and-bound method in order to 
increase its efficiency: 

(i) to search for a good algorithm for getting bounds that are as close as 
possible to Opt u (I) for any input instance I of the optimization problem 
E/, and 

(ii) to find a clever, efficient strategy to compute the bounds on the costs of 
the solutions in the set of solutions assigned to the vertices of Tj^yy 

One can be willing to invest more time in searching for a good bound on 
Optjj{I) (because the derived bound could be essential for the complexity 
of the run of the most complex part (the search in whereas the 

procedure of the estimation of a bound on a given set of solutions must be 
very efficient because it is used in every generated vertex, and the number of 
generated vertices is usually large. 

The last question we want to discuss here is what to do if branch-and- 
bound runs on some problem instances longer than one can wait for the result. 
One possibility is simply to break the search and to output the best feasible 
solution found up till now. Another possibility is to try to prevent runs that 
are too long from the beginning by changing the hard requirement of finding 
an optimal solution to the requirement of finding a solution whose cost does 
not differ more than t% from the optimal cost . 14 In such cases, one can cut 

14 An algorithm producing outputs with this property is called a (1 + t / 100)- 
approximation algorithm and such algorithms will be studied in Chapter 4. 




184 3 Deterministic Approaches 



any subtree T v from T M ^ if the bound on S v says that no solution in S v is 
t% better than the best feasible solution found up till now. 

Summary of Section 3.4 

The branch-and-bound method is an algorithm design technique for solving opti- 
mization problems. It makes backtracking more efficient by omitting the generation 
of some subtrees of the backtrack tree, because it is possible to recognize that 
these parts do not contain any optimal solution or they do not contain any fea- 
sible solution that is better than the best solution found up till now. Usually, a 
branch-and-bound algorithm consists of the following two parts: 

(i) The computation of a bound on the optimal cost. 

(ii) The use of the bound computed in (i) to cut some subtrees of the backtrack 
tree. 

The precomputed part (i) is usually done by another algorithm design method 
(approximation algorithms, relaxation to linear programming, local search, etc.), 
and the quality of the precomputed bound can have an essential influence on the 
efficiency of the branch-and-bound method. 



3.5 Lowering Worst Case Complexity of Exponential 
Algorithms 

3.5.1 Basic Concept 

This concept is similar to the idea of branch-and-bound in the sense that 
one designs an algorithm that solves a given problem and is even prepared 
to accept an exponential complexity. But in contrast to branch-and-bound, 
where the complexity substantially varies from input to input, one requires 
here that the worst case complexity of the designed algorithm is substan- 
tially smaller than the complexity of any naive approach. For instance, if one 
considers an optimization problem with a set of feasible solutions of cardinal- 
ity 2 n for problem instances of size n, then c n worst case complexity is an 
essential improvement for every c < 2. Thus, in this approach one accepts 
algorithms of exponential complexity if they are practical for input sizes for 
which straightforward exponential algorithms are no more practical. 

Figure 3.12 documents the potential usefulness of this approach. If one 
considers the complexity of 10 16 as the border between current tractability 
and nontractability, then we see that an algorithm of the complexity (1.2) n 
can be performed in a few seconds for n — 100 and that an algorithm of 
the complexity n 2 • 2^ is even tractable 15 for n = 300. On the other hand 
the application of an algorithm of the complexity 2 n for inputs of size 50 is 



15 Even for n = 300, n 2 • 2^™ is a number consisting of 11 digits only. 




3.5 Lowering Worst Case Complexity of Exponential Algorithms 185 



already on the border of tract ability. Thus, improving the complexity in the 
exponential range may essentially contribute to the practical solvability of 
hard problems. 



Complexity 


71= 10 


n = 50 


n = 100 


n = 300 


2 n 


1024 


(16 digits) 


(31 digits) 


(91 digits) 


2"72 


32 


~ 33 • 10 6 


(16 digits) 


(46 digits) 


(1.2) n 


7 


9100 


~ 29 • 10 e 


(24 digits) 


10 • 2^ 


89 


1350 


10240 


~ 1.64 • 10 6 


n 2 • 2^ 


894 


~ 336000 


~ 10.24 • 10 6 


~ 14.8 • 10 9 



Fig. 3.12. 



This approach can be used for any kind of algorithmic problem. Usually, 
to be successful with it one needs to analyze the specific problem considered, 
and use the obtained knowledge in combination with a fundamental algorithm 
design technique. Currently, there exists no concept that would provide a 
general strategy for designing practical exponential algorithms for a large class 
of algorithmic problems, and there is no robust theory explaining which hard 
problems could possess algorithms of complexity p(n) • c n for a polynomial p 
and a constant c < 2. 

In the next section we show a simple application of this approach for the 
decision problem 3Sat. 

3.5.2 Solving 3SAT in Less than 2 n Complexity 

We consider the decision problem (3Sat ,Ei ogic ), i.e., to decide whether a 
given formula F in 3CNF is satisfiable. If F is over n variables, the naive 
approach looking for the value F(a) for every assignment a to the variables 
of F leads to the (worst case) complexity 16 0(\F\ • 2 n ). Obviously, this simple 
algorithm works for the general Sat problem. In what follows we show that 
carefully using the divide-and-conquer method instead of the above-mentioned 
exhaustive search 3Sat can be decided in 0(\F\ • 1.84 n ) time. 

Let F be a formula in CNF, and let l be a literal that occurs in F. Then 
F(l = 1) denotes a formula that is obtained from F by consecutively applying 
the following rules: 

(i) All clauses containing the literal l are removed from F. 

(ii) If a clause of F contains the literal 7 and still at least one different literal 
from 7, then I is removed from the clause. 

16 Note that evaluating a CNF F for a given assignment to its variables can be done 
in 0(|F|) time, and that there are exactly 2 n different assignments to n variables 
of F. 




186 3 Deterministic Approaches 



(iii) If a clause of F consists of the literal Z only, then F(l = 1 ) is the formula 
0 (i.e. , an unsatisfiable formula). 

Analogously, F(l = 0) denotes a formula that is obtained from F in the 
following way: 

(i) All clauses containing the literal Z are removed from F. 

(ii) If a clause consists of at least two different literals and one of them is Z, 
then Z is removed from the clause. 

(iii) If a clause contains the literal Z only, then F(l = 0) is the formula 0. 

In general, for literals Zi, . . . , Z c , hi, . . . , hd, the formula 

F(l 1 = 1 , 12 = 1 9 • • • , l c = 1? — 0, Z 12 = 0, . . . , hd = 0) 

is obtained from F by constructing F(li = 1), then by constructing F(l\ = 
1) (Z 2 = 1), etc. Obviously, l\ = 1 , Z 2 = 1,...,Z C = l,hi = 0, ...,hd = 0 
determine a partial assignment to the variables of F. Thus, the question 
whether F(l\ = 1, Z 2 = 1,...,Z C = l,hi = 0, ...,hd = 0) is satisfiable 
is equivalent to the question whether there exists an assignment satisfying 
Zi = 1, . . . , Z c = 1, h\ — 0, . . . , hd = 0, and F. 

Consider the following formula 

F = (x 1 V x 2 V £4) A (T 2 ) A (£ 2 V £3 V £5) A (£1 V £5) A (xi V £ 2 V £ 3 ) 

of five variables £ 1 , £ 2 , £ 3 ,^ 4 , £5 in 3CNF. We see that 

F(x 1 = 1 ) = (£2) A (£ 2 V £3 V £5) A (£ 2 V £3), 

F (£ 2 = 1) = (£1 V £ 5 ) A (£1 V £ 3 ), and 

F (£ 3 = 0 ) = (£1 V £ 2 V £4) A (£ 2 ) A (£ 2 V £5) A (£1 V £5). 

Observe that F (£2 = 0 ) = 0 and F(x 2 = 1) = 0 because the second clause 
consists of the literal £2 only. The important fact is that, for every literal Z of 
F and a G {0, 1}, F(l = a) contains fewer variables than F. In what follows, 
for all positive integers n and r, 

3CNF(n, r) = {<P \ $ is a formula over at most n variables in 3CNF 
and contains at most r clauses}. 

Consider the following divide-and-conquer strategy for 3 Sat. Let F G 
3CNF(n, r) for some positive integers n, r, and let (Zi V I 2 V Z 3 ) be some clause 
of F. Observe that 

F is satisfiable at least one of the formulae F(li = 1 ), 

F(h = 0 , Z 2 = 1), F(h = 0, Z 2 = 0 , Z 3 = 1) (3.8) 

is satisfiable. 




3.5 Lowering Worst Case Complexity of Exponential Algorithms 187 



Following the above rules (i), (ii), and (iii), it is obvious that 

F(h = 1) G 3CNF(n — 1, ? — 1), 

F(h = 0, Z 2 = 1) G 3CNF(n - 2, r - 1), 

F(h = 0, Z 2 = 0, h = 1) G 3CNF(n - 3, r - 1). 

In this way we have reduced the question whether a formula F from 3CNF(n, r) 
is satisfiable to the satisfiability problem for three subinstances of F from 
3CNF(n - 1 , r - l),3CNF(n — 2,r — 1), and 3CNF(n - 3,r — 1). Thus, we 
obtain the following recursive algorithm for 3Sat. 

Algorithm 3.5.2.1 (D&C-3SAT (F)). 

Input: A formula F in 3CNF. 

Step 1: if F G 3CNF(3, k) or F G 3CNF(ra, 2) for some ra, k G IN - {0}, 
then decide whether F G 3Sat or not by testing all assignments to 
the variables of F ; 

if F G 3Sat output (1) else output (0). 

Step 2: Let H be one of the shortest clauses of F. 

if H — ( l ) then output(D&C-3SAT(F(Z = 1))); 
if H = (h V l 2 ) 

then output(D&C-3SAT(F(/i = 1)) 

VD&C-3Sat(F(/i = 0,Z 2 = 1))); 
if H = (h VZ 2 V^ 3 ) 
then output(D&C-3SAT(F(/i = 1)) 

VD&C-3Sat(F(Zi = 0,Z 2 = 1)) 

VD&C-3Sat(F(/i = 0,Z 2 = 0,Z 3 = 1))). 

Theorem 3. 5. 2. 2. The algorithm D&C-3SAT solves 3Sat and 

Timev&c- 3 SAT(F) = 0(r • 1.84 n ) 

for every F G 3CNF(n,r). 

Proof. The fact that the algorithm D&C-3SAT solves 3 Sat is obvious. If F 
contains a clause consisting of one literal l only, then it is clear that F is 
satisfiable iff F(l = 1) is satisfiable. If F contains a clause (l i V Z 2 )j then 
F is satisfiable iff F(l i = 1) or F(l\ — 0, = 1) is satisfiable. In the case 
when F consists of clauses of length 3 only, the equivalence (3.8) confirms the 
correctness of D&C-3SAT. 

Now let us analyze the complexity T(n, r) = TzmeD&c- 3 SAT(^, r) of the 
algorithm D&C-3SAT according to 

• the number of variables n, and 

• the number of clauses r. 




188 3 Deterministic Approaches 



Obviously, |F|/3 < r < |F| and n < |F| for every formula F over n 
variables that is in 3CNF and that consists of r clauses. 17 Because of Step 1, 

T(n, r) < 8 • |F| < 24r for all pairs of integers n, r, (3.9) 

where n < 3 or r < 2. 

For n G {1, 2} we may assume T( 2, r) < 12r and T( 1, r) < 3r. 

Since F(Z = a) can be constructed from F in time 3 • \F\ < 9 r, the analysis 
of Step 2 provides, for all n > 3, r > 2, 

T(n, r) < 54r + T(n — 1 , r — 1) + T(n — 2 , r — 1) + T(n — 3, r — 1). (3.10) 

We prove by induction according to n that 

T(n, r) < 27r • (1.84 n - 1) (3.11) 

satisfies the recurrence given in (3.10), and that for small n and r (3.9) holds. 
First of all, observe that for n — 3 and any r > 1, 

T(n, r) = T{ 3, r) = 27r • (1.84 3 - 1) > lOOr 

which is surely more than 24r and so Algorithm 3.5.2. 1 is executed below the 
complexity T(n,r) for n = 3. The cases n = 1,2 can be shown in a similar 
way. Thus, (3.9) is satisfied by T(n,r). 

Now starting with n = 4 we prove by induction according to n that T(n, r) 
given by formula (3.11) satisfies the recurrence (3.10). 

(1) For every r > 2, 

T(n, r) < 54 • r + T(3, r - 1) + T( 2, r - 1) + T(l, r - 1) 

( 3 . 10 ) 

< 54r + 24(r - 1) + 12(r - 1) + 3 (r - 1) 

< 93r < 27r • (1.84 4 - 1). 

(2) Let T(m, r) < 27 r • (1.84 m — 1) satisfy the recurrence (3.10) for all m < n. 
We prove (3.11) for n now. For every positive integer r > 2, 

T(n, r) = 54 • r + T(n — 1, r — 1) + T(n — 2, r — 1) + T(n — 3, r — 1) 

( 3 . 10 ) 

< 54r + 27(r - 1) • ( lM n ~ l - 1) 

indue. 

+27 (r - 1) • (1.84" -2 - 1) + 27 (r - 1) • (1.84"“ 3 - 1) 

< 54r + 27 • r ■ (1.84 n_1 + 1.84” -2 + 1.84 n “ 3 - 3) 

= 27r.l.84"-'(l + I ij + j2p)-3.27r + Mr 

< 27r • 1.84" — 27r = 27r • (1.84" — 1). 

17 For simplicity, consider |F| as the number of literals in F. 




3.6 Local Search 189 



Thus, there is a function T(n,r) E 0(r • 1.84 n ) that satisfies the recurrence 
(3.10). □ 

Exercise 3. 5. 2. 3. Extend the idea of Algorithm 3.5.2. 1 for any k > 3 in such 
a way that the observed algorithm for kSAT will be of complexity O ((c/e) n ), 
where Ck < Ck+i < 2 for every k > 3. □ 

Exercise 3. 5. 2. 4. Improve the algorithm D&C-3SAT in order to obtain 

an 0(r • 1.64 n ) algorithm for 3 Sat. □ 

In Section 5.3.7 we shall combine the concept of lowering the worst case 
complexity of exponential algorithms with randomization in order to design a 
randomized 0(r • (1.334) n ) algorithm for 3 Sat. 

Summary of Section 3.5 

The concept of lowering the worst case complexity of exponential algorithms fo- 
cuses on the design of algorithms that may even have an average case exponential 
complexity. But the worst case complexity of these algorithms should be bounded 
by an exponential function in 0(c n ) for some constant c < 2. Such algorithms 
can be practical because they run fast for realistic input sizes of many particular 
applications (Figure 3.12). The typical idea behind the design of such algorithms is 
to search for a solution in a space of an exponential size, but this space should be 
much smaller than the space searched by any naive approach. 

This concept became especially successful for the satisfiability problems. Instead 
of looking for all 2 n assignments to the n variables of a given formula one decides 
the satisfiability of by looking for at most 0(c n ) assignments for some c < 2. 
The algorithm D&C-3SAT is the simplest known example for an application of this 
concept. 



3.6 Local Search 

3.6.1 Introduction and Basic Concept 

Local search is an algorithm design technique for optimization problems. The 
main aim of Section 3.6 is to give a more detailed and formal presentation 
of this technique than the rough presentation of basic ideas of local search in 
Section 2.3.4. This is important because of the following two reasons: 

(i) The fundamental framework of the classical local search method presented 
here is the base for advanced local search strategies such as simulated 
annealing (Chapter 6) or tabu search. 

(ii) A formal framework for the study of local search algorithms enables com- 
plexity considerations that provide borders on the tractability of searching 
for local optima of optimization problems. 




190 3 Deterministic Approaches 



As already mentioned in Section 2.3.4, a local search algorithm realizes a 
restricted search in the set of all feasible solutions A4(I) to the given prob- 
lem instance I. To determine what a “restricted search” means, one needs 
to introduce a structure in the set of feasible solutions M.(I) by defining a 
neighborhood for every feasible solution of M(I). 

Definition 3.6.1. 1. Let U = (Ui, Zo, L, Lj, A4, cost, goal) be an optimiza- 
tion problem. For every x G Li, a neighborhood on *M(a?) is any mapping 
f x : M(x) — > Pot (Ad (x)) such that 

(i) a G f x (a t) for every a G M(x), 

(ii) if (3 G f x (ot) for some a G A4(x), then a G f x (P), and 

(Hi) for all a, (3 G M(x) there exists a positive integer k and 71 , . . . , 7 k G A4(x) 
such that 71 G f x (a), 7 »+i G f x (ji) for i = 1, . . . , k - l, and (3 G f x (jk)- 

If a G f x (P) for some a, (3 G M(x), we say that a and (3 are neighbors in 
M.(x). The set f x (ot) is called the neighborhood of the feasible solution 
ol in Ad(x). The (undirected) graph 

G M(x),fa, = (M(x), {{a, 0} | a € f x (P), a j- (3, a, (3 € M(x)}) 

is the neighborhood graph of Al(x) according to the neighborhood 

fx • 

Let, for every x G Lj, f x be a neighborhood on A4(x). The function f : 
U xGLj ({x} x M(x)) -» LUl, Pot{M.{x)) with the property f(x,a) — f x {ot) 
for every x G Lj and every a G M(x) is called a neighborhood for U. 

Local search in M.(x) is an iterative movement in M. from a feasible so- 
lution to a neighboring feasible solution. Thus, we see the importance of the 
condition (iii) of Definition 3.6. 1.1 that assures us that the neighborhood graph 
Gm(x)j is connected, i.e., every feasible solution (3 G A4(x) is reachable from 
any solution a G M{x) by iteratively moving from a neighbor to a neigh- 
bor. The following exercise shows that the conditions (i), (ii), and (iii) of the 
neighborhood function even determine a metrics on M{x), and so the pair 
(A4(x), f x ) can be considered as a metric space. 

Exercise 3. 6. 1.2. Let U = (£i, EchL,Li, M, cost, goal) be an optimization 
problem, and let Neigh x be a neighborhood on M(x) for some x G L/. Define, 
for all a, (3 G M(x), distance Neighs (**> (3) as the length of the shortest path 
between a and (3 in GM{x),Neigh x • Prove that distance Nei g h x is a metrics on 
M(x). ’ X □ 

Because of condition (ii) of Definition 3.6. 1.1 any neighborhood on M(x) 
can be viewed as a symmetric relation on M(x). When one wants to define 
a neighborhood on M(x) in a practical application, then one usually does 
not work in the formalism of functions or relations. The standard way to 
introduce a neighborhood on Ai(x) is to use so-called local transformations 
on A4(x). Informally, a local transformation transforms a feasible solution a 




3.6 Local Search 191 



to a feasible solution (3 by some local changes of the specification of a. For 
instance, an example of a local search transformation is the exchange of two 
edges in the local search algorithm for the minimum spanning tree problem 
in Example 2. 3. 4. 6. Flipping the Boolean value assigned to a variable is a 
local transformation for the Max- Sat problem. Having a set T of reasonable 
transformations one can define the neighborhood according to T by saying 
that a is a neighbor of (3 if there exists a transformation t E T such that t 
transforms a into (3. Typically, the local transformations are used to define a 
symmetric relation on M(x) and so one has to take care of the reachability 
property (iii) of Definition 3. 6. 1.1 only. 

Note that sometimes one uses neighborhoods that do not satisfy all the 
conditions (i), (ii), and (iii) of Definition 3. 6. 1.1. If condition (ii) is violated, 
then one has to define the neighborhood graph as a directed graph. 

Exercise 3.6. 1.3. Prove that the transformation of flipping the value of at 
most one variable defines a neighborhood on M($) for every input instance 
# of Max-Sat. □ 

Introducing a neighborhood on M. (x) enables to speak about local optima 
in M(x). Observe that this is impossible without determining any structure 
on the set M(x) of all feasible solutions to x. 

Definition 3.6. 1.4. Let U = (Zj, Z’o, L, Lj, M, cost, goal) be an optimiza- 
tion problem, and let, for every x E Lj, the function f x be neighborhood on 
M(x). A feasible solution a E M(x) is a local optimum for the input 
instance x of U according to f x , if 

cost(a ) = goal{cost((3 ) | (3 E f x (a)}. 

We denote the set of all local optima for x according to the neighborhood f x 

by LocOPTu(x , f x ). 

Having a structure on M.(x) determined by a neighborhood Neigh x for 
every x E Lj, one can describe a general scheme of local search as follows. 
Roughly speaking, a local search algorithm starts off with an initial solution 
and then continually tries to find a better solution by searching neighborhoods. 
If there is no better solution in the neighborhood, then it stops. 

LSS (Neigh) -Local Search Scheme according to a neighborhood Neigh 

Input: An input instance x of an optimization problem U. 

Step 1: Find a feasible solution a E M(x). 

Step 2: while a £ LocOPT\j{x , Neigh x ) do 

begin find a (3 E Neigh x (a) such that 
cost((3 ) < cost(a ) if U is a minimization problem and 
cost((3) > cost(a) if U is a maximization problem; a := (3 
end 

Output: output (a). 




192 3 Deterministic Approaches 



The following theorem is a direct consequence of the way in which LSS 
works. 

Theorem 3.6. 1.5. Any local search algorithm based on LSS {Neigh) for an 
optimization problem U outputs , for every instance x of U , a local optimum 
for x according to the neighborhood Neigh . 

The success of a local search algorithm mainly depends on the choice of 
neighborhood. If a neighborhood Neigh has the property that Neigh(a) has 
a small cardinality for every a E M.(x), then one iterative improvement of 
Step 2 of LSS (Neigh) can be executed efficiently but the risk that there are 
many local optima (potentially with a cost that is very far from Opt v (x)) can 
substantially grow. On the other hand, large | Neigh x (a)\ can lead to feasible 
solutions with costs that are closer to Optjj(x) than small neighborhoods can, 
but the complexity of the execution of one run of the while cycle in Step 
2 can increase too much. Thus, the choice of the neighborhood is always a 
game with the tradeoff between the time complexity and the quality of the 
solution. Small neighborhoods are typical for most applications. But a small 
neighborhood alone does not provide any assurance of the efficiency of the local 
search algorithm, because the number of runs of the while cycle of the local 
search scheme can be exponential in the input size. We can only guarantee a 
pseudo-polynomial-time algorithm for finding a local optimum of optimization 
problems for which the costs of feasible solutions are integers whose values are 
bounded by p(Max-Int(x)) for a polynomial p. The argument for this claim 
is obvious because 

(i) if the costs are integers, then each run of the while cycle improves the cost 
of the best found solution by at least 1, and 

(ii) if the costs are in the set { 1 , 2, . . . ,p(Max-Int(x))} of positive integers, 
then at most p(Max-Int(x)) repetitions of the while cycle are possible. 

Theorem 3.6. 1.6. Let U = (Zo, Xj, L, Lj, A4, cost, goal) be an integer- 
valued optimization problem with the cost function cost from feasible solu- 
tions to positive integers. Let there exist a polynomial p such that cost(a,x ) < 
p(Max-Int(x )) for every x E Lj and every a E M(x). For every neighbor- 
hood Neigh such that Neigh x (a) can be generated from a and x in polynomial 
time in \x\ for every x E Lj and every a E M(x), LSS (Neigh) provides a 
pseudo-polynomial-time algorithm that finds a local optimum according to the 
neighborhood Neigh. 

Besides the choice of the neighborhood the following two free parameters 
of LSS (Neigh) may influence the success of the local search: 

(1) In Step 1 of LSS (Neigh) an initial feasible solution is computed. This can 
be done in different ways. For instance, it can be randomly chosen or it 
can be precomputed by any other algorithmic method. The choice of an 
initial solution can essentially influence the quality of the resulting local 




3.6 Local Search 193 



optimum. This is the reason why one sometimes performs LSS(Neigh) 
several times starting with different initial feasible solutions. There is no 
general theory providing a concept for the choice of the initial feasible 
solution, and so one usually generates them randomly. The execution of 
several runs of LSS(Neigh) from randomly chosen initial feasible solutions 
is called multistart local search. 

(2) There are several ways to choose the cost-improving feasible solution in 
Step 2. The basic two ways are the strategy of the first improvement 
and the strategy of the best improvement. The first improvement strat- 
egy means that the current feasible solution is replaced by the first cost- 
improving feasible solution found by the neighborhood search. The best 
improvement strategy means that the current feasible solution is replaced 
by the best feasible solution in its neighborhood. Obviously, the first im- 
provement strategy can make one single run of the while cycle faster than 
the best improvement strategy, but the best improvement strategy may 
decrease the number of executed runs of the while cycle in the comparison 
to the first improvement strategy. Note that these two strategies may lead 
to very different results for the same initial feasible solution. 

The rest of this section is organized as follows. Section 3.6.2 shows some 
examples of neighborhoods for distinct optimization problems and introduces 
the so-called Kernighan-Lin variable-depth search that can be viewed as an 
advanced form of local search. Section 3.6.3 discusses the borders of appli- 
cability of the local search schemes for searching for optima of optimization 
problems. 

3.6.2 Examples of Neighborhoods and 
Kernighan-Lin’s Variable-Depth Search 

A local search algorithm is determined by the definition of the neighborhood 
and by the local search scheme presented in Section 3.6.1. Thus, if one chooses 
a neighborhood, then the local search algorithm is more or less determined. 
This is the reason why we do not present examples of algorithms in this section, 
but examples of suitable neighborhoods for some optimization problems. 

The most famous neighborhoods for TSP are the so-called 2 -Exchange 
and 3 - Exchange . 18 The simplest way to define them is to describe the cor- 
responding local transformations. A 2-Exchange local transformation consists 
of removing two edges {a, b} and {c, d} with |{a,6, c, d}\ = 4 from a given 
Hamiltonian tour a that visits these 4 vertices in the order a, 6, c, d, and of 
adding two edges {a,d} and {b, c} to a. We observe (Figure 3.13) that the 
resulting object is again a Hamiltonian tour (cycle). 

18 Note that the notation 2 -Opt and 3- Opt instead of 2- Exchange and 2- Exchange is 
also frequently used in the literature. We choose not to use this notation because 
it is similar to our notation Opt v {x) for the optimal cost. 




194 3 Deterministic Approaches 



a b a b 





Fig. 3.13. 2- Exchange local transformation. The edges {a, b} and {c, d } are replaced 
by the edges {a, d} and {b,c}. 



For every input instance (K n ,c) of TSP, n e IN — {0}, and for every 
a e M(K n ,c ), the size of the neighborhood 2 -Exchange(a) is — ■ which 
is in i?(n 2 ). 

Similarly, a 3- Exchange local transformation starts by removing 3-edges 
{a, 6}, {c, d}, and {e, /} such that |{a, 6, c, d, e, /}| = 6 form a Hamiltonian 
tour a. Then there are several possibilities to add 3 edges in order to get 
a Hamiltonian tour again. Some of these possibilities are depicted in Figure 
3.14. 

Observe that 2 -Exchange(a) C 3 -Exchange(a) for every Hamiltonian tour 
a because putting back one of the removed edges is not forbidden (see the last 
case of Figure 3.14 where the originally removed edge (a, b) is contained in the 
resulting tour) and so any 2- Exchange local transformation can be performed. 
The cardinality of 3 -Exchange(a) is in i?(n 3 ). 

One can also consider the k-Exchange(a) neighborhood that is based on 
the exchange of k edges. But k- Exchange neighborhoods are rarely used in 
practical applications for k > 3 because the k-Exchange(a ) neighborhoods 
are too large. Observe that the number of possibilities to choose k edges is in 
f2(n k ) and additionally the number of possibilities to add new edges in order 
to get a Hamiltonian tour grows exponentially with k. A report on the exper- 
iments with the application of local search schemes with the neighborhoods 
2-Exchange and 3-Exchange is given in Chapter 7. 

The most natural neighborhood for the maximum satisfiability problem 
is based on flipping one bit of the input assignment. For a formula ^ of n 
variables, this Flip neighborhood has cardinality exactly n. One can easily see 
that the neighborhood graph Gm(<p),fUp is the well-known hypercube. 

For the cut problems, there is a very simple way to define a neighborhood. 
For a given cut{\ i, V 2 ) one simply moves one vertex from V\ (assuming V\ 
contains at least two vertices) to V 2 or from V 2 to Vi . This neighborhood is of 




3.6 Local Search 195 





Fig. 3.14. Examples of the 3-Exchange local transformation. The edges {a, 6}, 
{c, d}, and {e, /} are replaced by other three edges. The last figure shows that 
replacing only two edges is also allowed. In this case the edges {e, /} and {c, d} are 
replaced by the edges {e, c} and {d, /}. 



linear size and one can extend it to larger neighborhoods by moving several 
vertices between V\ and V 2 . 

A crucial drawback of local search is the fact that LSS (Neigh) may get 
stuck in very poor local optima. 19 It is obvious that large neighborhoods 
can be expected to yield feasible solutions of higher quality, but the time 
complexity to verify local optimality becomes too large to be profitable. 20 
To overcome the difficulty with this tradeoff between time complexity and 
solution quality, we introduce the so-called variable- depth search algorithm 
that may realize several local transformations in order to find an improvement 

19 That is, in local optima whose costs substantially differ from the cost of optimal 
solutions. 

20 The effort to get a better solution may be too large in the comparison to the size 
of its quality improvement. 




196 3 Deterministic Approaches 



in a larger distance in Gj^^^ eig h but does not realize the exhaustive search 
of all feasible solutions in any large distance from the current solution. 

Let us explain this concept in detail for an optimization problem U . Re- 
member that working with local transformations, we overall assume that the 
feasible solutions of U are specified as lists (pi, . . . ,p n ) of n local specifica- 
tions, and a local transformation may change one or a few of them. Let Neigh 
be a neighborhood for U that can be determined by a local transformation. If 
one defines, for every positive integer k and every a G M(I) for an instance 
/oft/, 

Neigh 1 } (a) = {(3 G M(I) | distance NeighA®, P) < k} 

as the set of feasible solutions that can be achieved from a by at most k 
applications of the local transformation, then there obviously exists an m 
such that 

Neigh™ (a) — M{I) 

for every a G A 4 (I). Typically, m is approximately equal to n, where n is 
the length of the specification. For instance, Flip#(a) = M($) for Max-Sat 
and this property holds for all other neighborhoods introduced in this section. 
The variable-depth search algorithm can enable us to efficiently find a solution 
(3 from Neigh 7 } (a) as the next iterative improvement of the current feasible 
solution a without realizing the exhaustive search of Neigh} (a). This is done 
by the following greedy strategy. 

Let, for any minimization [maximization] problem £/, and all feasible so- 
lutions a,// G M(I) for an instance I of U, 

gain(cx, (3) = cost(a) — cost((3) [cost(/3) — cost(a)]. 

Note that gain(a,{ 3 ) may be also negative if (3 is not an improvement of 
a. In order to overcome a local optima of Neigh the idea is to apply the local 
transformation at most n times in such a way that: 

(i) if starting with a feasible solution a = (pi,P2? • • • ,Pn)> then the resulting 
feasible solution 7 = (#1, #2? • • • > <7n) has the property qi ^ pi for all i = 
l,...,n, 

(ii) if ao, 07, <22, . . . , a m , where a = ao> <r m = 7, is the sequence of created 
feasible solutions (a^+i is obtained from a* by the application of one local 
transformation), then 

a) gain(ai , ai+\) = ma x{pam(a^, J) | 6 G Neigh(ai)} 

(i.e., we use the greedy strategy to go from oti to c^+i), and 

b) if the step from a.i to 1 changes the parameter pj of the initial 
feasible solution a to some q 7 -, then qj is never changed later (i.e., qj 
is a parameter of 7). 

After creating the sequence ao,aq, . . . , a m , the algorithm replaces a by 
such an cq that 

gain(ot , oli) = ma x{gain(a, oti) | i = 1 , . . . , m} 




3.6 Local Search 



197 



if gain(a, ai ) >0. If gain(a , ai) < 0, then the algorithm halts with the output 
a. The main idea of the above-described approach is that a few steps in 
the wrong direction (when gain(a , ai), gain(ai, (* 2 ), . . . , gain(a r , ce r +i) are all 
negative) may ultimately be redeemed by a large step in the right direction 
(gain(a r ,a r +i) > #am(c^,a* + i)|, for instance). 

Kh(Neigh) Kernighan-Lin Variable- Depth Search Algorithm with 
respect to the neighborhood Neigh 

Input: An input instance / of an optimization problem U. 

Step 1: Generate a feasible solution a — (pi,P 2 , • • • ,p n ) G M(I) where (pi, 

P 2 ?---jPn) » s such a parametric representation of o that the local 
transformation defining Neigh can be viewed as an exchange of a few 
of these parameters. 

Step 2: IMPROVEMENT := TRUE; 

EXCHANGE := {1,2,..., n}; J := 0; aj := a; 
while IMPROVEMENT = TRUE do begin 
while EXCHANGE ^ 0 do 
begin J := J + 1; 

oj := a solution from Neigh(aj- 1) such that gain(ptj-\, aj) 
is the maximum of 

{gain(aj-i,5)\5 G Neigh(aj- 1) — {aj_i} and S differs 
from aj- 1 in the parameters of EXCHANGE only}; 
EXCHANGE := EXCHANGE -{the parameters in which 
olj and otj—i differ} 
end; 

Compute gain(a , a*) for i = 1, . . . , J; 

Compute l G {1, . . . , J} such that 

gain(a J on) — ma x{gain(a, ct{) | i G {1, 2, . . . , J}}; 

if gain(a,ai) > 0 then 
begin a := o/; 

EXCHANGE := {1,2,..., n} 

end 

else IMPROVEMENT := 

end 

Step 3: output (a). 

An important point is that KL(Neigh) uses the greedy strategy to find 
an improvement of the current feasible solution a in Neigh 1 (a) for some 
l G {1 ,...,n} and so it avoids the running time \Neigh l (a)\ that may be 
exponential in |o|. In fact, the time complexity of the search for one iterative 
improvement of a in KL(Neigh) is at most 

n • /(H) • \Neigh(ot)\, 

where / is the complexity of the execution of one local transformation. 




198 3 Deterministic Approaches 

For the problems Min- Cut and Max- Cut and the neighborhood consist- 
ing of the move of one vertex from V\ to V 2 or vice versa, the complexity of 
the local transformation is 0(1) and so one iterative improvement of KL costs 
0(n ) only. We have the same situation for KL (Flip) for Max-Sat. 

Note that in some applications one can consider a modification of the 
Kernighan-Lin’s variable- depth search algorithm. For instance, an additional 
exchange of an already changed parameter or the return to the original value 
of a parameter may be allowed under some circumstances. If Neigh(a) is too 
large (for instance, superlinear as in the case of 2-Exchange and 3-Exchange) 
one can omit the exhaustive search of Neigh(a) by using an efficient greedy 
approach to find a candidate f3 G Neigh(a) with a reasonable gain. 



3.6.3 Tradeoffs Between Solution Quality and Complexity 

As we already noted, local search can be efficient for small neighborhoods, but 
there are optimization problems such that, for every small neighborhood, one 
can find hard input instances for which local search either provides arbitrarily 
poor solutions or needs exponential time to compute at least a local opti- 
mum . 21 Local search assures a good solution only if local optima according 
to the given neighborhood have costs close to the optimal cost. As we will see 
in Section 4 this can lead to the design of good approximation algorithms for 
some optimization problems. 

Assuming P 7 ^ NP, there is no polynomial-time local search algorithm for 
any NP-hard optimization problem, because there is no polynomial-time al- 
gorithm for any NP-hard problem at all. Thus, one does not need to develop 
any special method to prove that an optimization algorithm cannot be solved 
in polynomial time by local search. But using local search for a hard optimiza- 
tion problem one can at least ask where the difficulty from the local search 
point of view lies. We already observed that the time complexity of any local 
search algorithm can be roughly bounded by 

( time of the search in a local neighborhood ) • ( the number of improvements ) . 

We pose the following question: 

For which NP -hard optimization problems can one find a neighborhood 

Neigh of polynomial size such that LSS (Neigh) always outputs an 

optimal solution? 

Note that such neighborhoods may exist for hard problems because local 
search algorithms with small neighborhoods may have an exponential com- 
plexity due to a large number of iterative improvements needed in the worst 
case. To formulate our question more precisely, we need the following defini- 
tion. 



21 



Such an example will be presented at the end of this section. 




3.6 Local Search 199 



Definition 3.6.3. 1. Let U — (Zj, Uo,L,Li,M, cost, goal) be an optimiza- 
tion problem, and let f be a neighborhood for U . f is called an exact neigh- 
borhood, if, for every x £ Li, every local optimum for x according to f x is 
an optimal solution to x (i.e., LocOPTjj(x, f x ) is equal to the set of optimal 
solutions for x). 

A neighborhood f is called polynomial- time searchable if there is a 
polynomial-time 22 algorithm that, for every x £ L/ and every a £ M.(x), 
finds one of the best feasible solutions in f x (a). 

We note that if a neighborhood is polynomial-time searchable, this does not 
mean that this neighborhood is of polynomial size. There exist neighborhoods 
of exponential size that are searchable even in linear time. 

Thus, for a given optimization problem U £ NPO, our question can be 
formulated in the terminology introduced in Definition 3.6.3. 1 as follows: 

Does there exist an exact polynomial-time searchable neighborhood for U ? 

The positive answer to this question means that the hardness of the prob- 
lem from the local search point of view is in the number of iterative improve- 
ments needed to reach an optimal solution. In many cases, it means that local 
search may be suitable for U. For instance, if U is an integer-valued optimiza- 
tion problem, then the existence of a polynomial-time searchable exact neigh- 
borhood Neigh for U can imply that LSS {Neigh) is a pseudo-polynomial-time 
algorithm 23 for U. 

The negative answer to this question implies that no polynomial-time 
searchable neighborhood can assure the success of local search in searching 
for an optimal solution. So, one can at most try to obtain a local optimum in 
polynomial time in such a case. 

We present two techniques for proving that an optimization problem is 
hard for local search in the above-mentioned sense. Both are based on suitable 
polynomial-time reductions. 

Definition 3. 6.3.2. Let U = {Ei,Eo,L,Li,M, cost, goal) be an integer- 
valued optimization problem. U is called cost-bounded, if for every input 
instance I £ Lj, Int(I) = (i\, i2, . . . , in), ij £ IN for j = 1, . . . , n, 

n 

cost (a) < ij 

3 = 1 



for every a £ M(I). 

22 Note that we consider optimization problems from NPO only, and so any algo- 
rithm working in polynomial time according to \a\ is a polynomial-time algorithm 
according to \x\. 

23 See Theorem 3.6. 1.6. 




200 3 Deterministic Approaches 



Observe that the condition cost (a) < ij natural because it is 

satisfied for all integer-valued optimization problems considered in this book. 
The costs of feasible solutions are usually determined by some subsum of the 

E n 

j= l l 3- 

Theorem 3. 6. 3. 3. Let U G NPO be a cost-bounded integer-valued optimiza- 
tion problem such that there is a polynomial-time algorithm that , for every 
instance x of U, computes a feasible solution for x. If P ^ NP and U is 
strongly NP -hard, then U does not possess an exact , polynomial-time search- 
able neighborhood . 

Proof. Assume the opposite that U = (Zj, So, L, Li,M , cost , goal ) is strongly 
NP-hard and has an exact, polynomial-time searchable neighborhood Neigh. 
Then we design a pseudo-polynomial-time algorithm Ajj for U which together 
with the strong NP-hardness of U contradicts P 7^ NP. 

Without loss of generality assume goal — minimum. Since there exists 
an exact polynomial-time searchable neighborhood Neigh for {/, there exists 
a polynomial-time algorithm A that, for every feasible solution a G M(x), 
x G T/, finds a feasible solution f3 G Neigh x (a) with cost(a,x) > cost(/3,x) 
or verifies that a is a local optimum with respect to Neigh x . We assume that 
there exists a polynomial-time algorithm B that, for every x G Lj, outputs an 
ao G M(x). Thus, a pseudo-polynomial-time algorithm Ajj for U can work as 
follows. 

Algorithm Ajj 

Input: An x G Lj. 

Step 1: Use B to compute an feasible solution ao G M.{x). 

Step 2: Use A to iteratively improve ao, until a local optimum with respect 
to Neigh x was found. 

As mentioned above, both A and B work in polynomial time according to 
the sizes of their inputs. Since U G NPO, |a| is polynomial in \x\ for every 
x G Li and every a G M(x) and so A works in polynomial time according to 
| a: |, too. Let Int(x) = {zi, . . . ,i n }- Since U is a cost-bounded integer- valued 
problem, the costs of all feasible solutions of M(x) lie in {1, 2 , ... , Y^j=\ ij} — 
{ 1 , 2 , ...,n • Max-Int(x)}. Since every iterative improvement improves the 
cost at least by 1, the number of runs of the algorithm A is bounded by 
n • Max-Int(x ) < \x\ • Max-Int(x). Thus, A\j is a pseudo-polynomial-time 
algorithm for U . □ 

Corollary 3. 6. 3. 4. If P / NP, then there exists no exact polynomial-time 
searchable neighborhood for TSP, A-TSP, and Weight-VCP. 

Proof. In Section 3.2.4 we proved that the cost-bounded integer- valued prob- 
lems TSP, A-TSP, and Weight-VCP are strongly NP-hard. □ 




3.6 Local Search 201 



The second method for proving the nonexistence of an exact polynomial- 
time searchable neighborhood is independent of the presented concept of 
pseudo-polynomial-time algorithms. The idea is to prove the NP-hardness 
of deciding the optimality of a given solution a to an input instance x. 

Definition 3. 6. 3. 5. Let U = (Zj, Zo, L, Lj, AT cost, goal) be an optimiza- 
tion problem from NPO. We define the suboptimality decision problem 
to U as the decision problem (SUBOPT u , Zj U Zo), where 

SUBOPTu = {(#, a) Gf/X Eq | a £ M(x) and a is not optimal}. 

Theorem 3. 6 . 3. 6 . Let U £ NPO. If P ^ NP, and SUBOPTu is NP -hard, 
then U does not possess any exact, polynomial-time searchable neighborhood. 

Proof. It is sufficient to show that if U possesses an exact, polynomial-time 
searchable neighborhood Neigh , then there exists a polynomial- time algorithm 
A that decides ( SUBOPTu , Zj U Z 0 ), be., SUBOPTu £ P. 

Let (x,a) £ Lj x Ai(x) be the input of A. A starts by searching in 
Neigh x (a) in polynomial time. If A finds a better solution than a , then A ac- 
cepts (x, a). If A does not find a feasible solution better than a in Neigh x (a), 
then a is an optimal solution to x because Neigh is an exact neighborhood. 
In this case A rejects (x, a). □ 

We again use TSP to illustrate the applicability of this approach. Our aim 
is to prove that SUBOPT tsp is NP-hard. To do it we first need to prove that 
the following decision problem is NP-hard. The restricted Hamiltonian 
cycle problem (RHC) is to decide, for a given graph G = (V, E) and a 
Hamiltonian path P in G, whether there exists a Hamiltonian cycle in G. 
Formally, 

RHC = {(G, P) | P is a Hamiltonian path in G that cannot 
be extended to any Hamiltonian cycle, 
and G contains a Hamiltonian cycle}. 

In comparison to inputs of HC, we see that one has additional information 
in RHC, namely a Hamiltonian path in G. The question is whether this addi- 
tional information makes this decision problem easier. The answer is negative. 
We show it by reducing the NP-complete HC problem to RHC. 

Lemma 3.6.3. 7. 

HC < p RHC 

Proof. To find a polynomial-time reduction from HC to RHC we need the 
special-purpose subgraph called diamond that is depicted in Figure 3.15. 

When we take this graph as a subgraph of a more complicated graph G, 
then we shall allow the connection of the diamond with the rest of G only via 
edges incident to the four corner vertices N (north), E (east), S (south), 




202 3 Deterministic Approaches 




Fig. 3.15. 



and W (west). With this assumption we call attention to the fact that if G 
has a Hamiltonian cycle then the diamond can be traversed only in one of the 
two ways depicted in Figure 3.16. The path depicted in Figure 3.16a is called 
the North-South mode of traversing the diamond and the path depicted in 
Figure 3.16b is called the East- West mode of traversing the diamond. 





To see this, consider that a Hamiltonian cycle C of G enters the diamond at 
the vertex N. The only possibility of visiting x is to continue to x immediately 
and then to W. C cannot leave the diamond at the vertex W because if C 
would do that, then neither v nor u could be visited later or y cannot be 
visited. Thus, C must continue to v. From v, C must go to u because in the 
opposite case u remains unvisited by C. Being in tx, the only possibility to 




3.6 Local Search 203 



visit the rest of the diamond is to continue to E, y , and S and to leave the 
diamond at 5. Thus, we have obtained the North-South mode of traversing 
the diamond. When C enters the diamond at any of the vertices W, E, or S, 
the argument is an analogous one. 

Now we describe a polynomial-time reduction from HC to RHC. Let G = 
(V, E ) be an instance of HC. We shall construct a graph G' with a Hamiltonian 
path P in G' in such a way that G has a Hamiltonian cycle if and only if G' 
has a Hamiltonian cycle. 

The idea of the construction of G' is to replace the vertices of G with 
diamonds and to embed the edges of G into G' by connecting east and west 
vertices of the corresponding diamonds. The Hamiltonian path P is created 
by connecting all diamonds using the North-South mode. 

More precisely, let V = {wi, W2 , . . . , w n }. We set G' — (V',E'), where 
V' = U?=i{Ni,W i ,E i ,S i, xi, Ui, Vi, yi} and E ' contains the following sets of 
edges: 

(i) all edges of the n diamonds, i.e., {{N*, Xi}, {N*, Ui}, {Wj, Xi}, {W*, Vi}, 

{ Vi , {E 2 , Ui\ , {E^, yi\ , {S 2 , vi{ , {S^, 2/i}} for i 1,2,. .., 72 , 

(ii) {{Si, N i+ i} ji = l,2,...,n-l} (Figure 3.17a), 

(iii) {{Wi,Ej}, {Ei, Wj} | for all i,j £ {l,...,n} such that {wi,Wj} £ E} 
(Figure 3.17b) 

Following Figure 3.17a we see that the edges of the class (ii) unambiguously 
determine a Hamiltonian path P that starts in Ni , finishes in S n , and traverses 
all diamonds in the North-South mode in the order Ni, Si, N 2 , S 2 , . . . , N n _i, 
S n _i, N n , S n . Since Ni and S n are not connected in G f , P cannot be completed 
to get a Hamiltonian cycle. 

It is obvious that G' together with P can be constructed from G in linear 
time. 

It remains to be shown that G f contains a Hamiltonian cycle if and only if 
G contains a Hamiltonian cycle. Let H = v\, , Vi 2 , . . . , Vi ri _ 1 , V\ be a Hamil- 

tonian cycle in G. Then 



Wi,...,E 1 ,W il ,...,E il ,W <2 ,.. 



F 

. , J^2 2 , . . 



■W in _ 1 ,...,E< n _ 1 ,W 1 



mimics H by visiting the diamonds in G f in the same order as in H and 
moving through every diamond in the East- West mode. 

Conversely, let H f be a Hamiltonian cycle in G ' . First of all, observe that 
H' cannot enter any diamond in its north vertex or in its south vertex. If this 
would happen then this diamond must be traversed in the North-South mode. 
But this directly implies that all diamonds must be traversed in the North- 
South mode. Since there is no edge incident to Ni or S n besides the inner 
diamond edges, this is impossible. Thus, H' must traverse all diamonds in 
the East- West mode. Obviously, the order in which the diamonds are visited 
determines the order of vertices of G that corresponds to a Hamiltonian cycle 
of G. □ 




204 



3 Deterministic Approaches 




Fig. 3.17. 



Now we use the NP-hardness of RHC to show that SUBOPT tsp is NP- 
hard. 



Lemma 3. 6. 3. 8. 



RHC < p SUBOPTtsp- 



Proof. Let (G, P), where G — (F, E ) is a graph and P is a Hamiltonian path 
in G, be an instance of RHC. Let G have n vertices ui, . . . , v n . We construct 
an instance ( K n ,c ) of TSP and a feasible solution a G M(K n ,c) as follows: 



(i) K n = (V,E t 

(ii) c({ui,Vj}) = 



m 



) is the complete graph of n vertices, 
if {vi,vj} € E 
if {vi,v/} <£ E ’ 

a visits the vertices of K n in the order given by P. 



im) 1 

0 



and 



Now we have to prove that (G, P) G RHC if and only if a is not an optimal 
solution for ( K ni c ). 




3.6 Local Search 205 



Let (G, P) G RHC. Then the cost of the Hamiltonian cycle a (determined 
by P) in K n is exactly (n— l) + 2 = n + l because P cannot be extended to 
a Hamiltonian cycle. Since G contains a Hamiltonian cycle /?, the cost of (3 in 
K n is exactly n and so the feasible solution a is not optimal. 

Let a be no optimal solution for (K n , c). Then cost (a) — n -hi. (cost (a) > 
n- hi because the minimal possible cost of a Hamiltonian cycle is n and a is not 
optimal, cost (a) < n + 1 because a contains n — 1 edges of P of cost 1.) Since 
a is not optimal there exists a Hamiltonian cycle /? in K n with cost(/3) = n. 
Obviously, (3 is a Hamiltonian cycle in G, and so (G, P) G RHC. □ 

Corollary 3. 6. 3. 9. SUBOPTtsp is NP -hard, and so, z/P ^ NP, TSP does 
not possess any exact, polynomial-time searchable neighborhood. 

Proof. This assertion is a direct consequence of Lemma 3. 6. 3. 8 and Theorem 
3. 6. 3. 6. □ 

Now we use the structure of the diamond in order to show that there 
are pathological instances of TSP for large k-Exchange neighborhoods. Here, 
pathological means that the instances have a unique optimal solution and 
exponentially many second-best (local) optima whose costs are exponential in 
the optimal cost. So, starting from a random initial solution there is a large 
probability that the scheme LSS(n/ 3- Exchange) outputs a very poor feasible 
solution. 

Theorem 3.6.3.10. For every positive integer k >2, and every large number 
M > 2 8k , there exists an input instance ( K 8 k,CM ) of TSP such that 

(i) there is exactly one optimal solution to ( K 8 k,CM ) with the cost 

Opt TSP (H 8k > Cm) = &k, 

(ii) the cost of the second-best feasible solutions (Hamiltonian cycle) is M+5k, 
(in) there are 2 k ~ 1 (k — 1)! second-best optimal solutions for (K^i^m), and 
(iv) every second-best optimal solution differs from the unique optimal solu- 
tion in exactly 3k edges (i.e., every second-best optimal solution is a local 
optimum with respect to the (3k — 1) -Exchange neighborhood). 

Proof. To construct (K S k, cm) for every k, we have to determine the function 
cm from the edges of the complete graph of 8 k vertices to the set of non- 
negative integers. To see the idea behind it we take the following four steps. 

(1) First view K 8 k as a graph consisting of k diamonds Di = (Vi,Fi) — 
({Ni, E^, S^, Xi, Ui, V{, yi\, {yi, 

Ui }, (E*, yi}, {E », Ui}, {Si, v^, {S», 2/i}}) for i = 1, 2 , . . . , k as depicted in 
Figure 3.15. Following the proof of Lemma 3.6.3. 7 we know that if a graph 
G contains a diamond Di whose vertices Xi,Ui,Vi, yi have degree two in G 
(i.e., there are no edges connecting the vertices x , u , v, y of the diamond 
with the rest of the graph) and G does not contain any edge between the 
vertices Vi aside from the edges in Fi, then every Hamiltonian cycle in G 




206 3 Deterministic Approaches 



traverses Di either in the North-South mode or in the East- West mode 
(Figure 3.16). Obviously, we cannot force some edges to be missing in 
Kg k , but we can use very expensive costs for these edges in order to be 
sure that no optimal solution and no second-best feasible solution could 
contain any of these edges. 

Set the cost of all edges incident to vertices x*, U*, Vi, yi, but the edges 
of the diamond Di , to 2 • M for i = 1 , 2, . . . , k. Set the cost of all edges in 
{{r, 5 } | r, s G Vi, r ^ 5 } — Fi to 2 • M for i = 1, 2, . . . k. 




(2) Connect the Sk diamonds Di by edges {E*, mod fc)+i}- This results in 
a graph with exactly one Hamiltonian cycle 24 

H E -w = Wi,...,E 1 ,W 2 ,...,E 2 ,W 3 ,...,W fc ,...,E fc ,W 1 

that traverses all diamonds in the East- West mode (see Figure 3.18). As- 
sign to every edge of H e-w the cost 1. Thus, costiJ^E-w) — 8/c. In this 
way we have assigned the weight 1 to all edges of the diamond D*, apart 
from the edges {W*,^}, {u^E*}. Set c M {{Wi,Vi}) = c M ({^,E*}) = 0. 
Since we assign the value 2 M to all remaining edges between the vertices 
of the set 

W-E = { W 1 ,...,W fc ,E 1 ,...,E fc }, 

every Hamiltonian cycle different from He-w and traversing the dia- 
monds in the East- West mode has cost at least 2 • M > 2 8/c+1 . 

(3) Let 

N-S = { N 1 ,N 2 ,...,N fc ,S 1 ,S 2 ,...,S fc }. 

Assign to all edges between the vertices of N-S — {Ni} the cost 0 
(i.e., the vertices of N-S — {Ni} build a clique of edges of cost 0 in 
K sk ). Set c m ({Ni,NJ) = cm({Ni, Sj}) = M for all i <E {2,3, ...,fc}, 
j G {1, 2, . . . , k}. This implies that every Hamiltonian cycle traversing all 
diamonds in the North-South mode has the cost exactly M + 5fc. (Observe 
that in traversing a diamond in North-South mode one goes via 5 edges 
of cost 1 and via 2 edges of cost 0). 

(4) For all edges not considered in (1), (2), and (3), assign the value 2 • M. 



24 



If one assumes that the edges of cost 2 M are forbidden. 




3.6 Local Search 207 



Now let us show that the constructed problem instance has the required 

properties. 

(i) Consider the subgraph G' obtained from K$k by removing all edges of 
cost M and 2 M. G' contains only one Hamiltonian cycle He-w of cost 
8k. This is because G' consists of diamonds connected via vertices Ni, 
W i, E*, and Si, only and so every Hamiltonian tour in G' must traverse 
all diamonds either in the North-South mode or in the East- West mode. 
He-w is the only Hamiltonian cycle going in the East- West mode because 
CM({Ej,Wj}) = 2 M for all i,j G {l,...,fc}, j ^ (i mod k) - hi (see 
Figure 3.18). There is no Hamiltonian tour traversing diamonds in the 
North-South mode because cm({Ni, S^}) = cm({Ni, N^}) = M for all 
j € {2, . . . , n}. 

Thus, He-w is the unique optimal solution to because any 

other Hamiltonian cycle must contain at least one edge with cost at least 
M > 2 8k > 8k for k > 2. 

(ii) Consider the subgraph G ,f obtained from Kgk by removing all edges of cost 
2 M (i.e., consisting of edges of cost 0, 1, and M). G" also has the property 
that every Hamiltonian cycle traverses all diamonds either in the North- 
South mode or in the East- West mode. As already claimed above in (i), 
every Hamiltonian cycle different from He-w and traversing diamonds in 
the East- West mode must contain at least one edge of cost 2 M, and so it 
is not in G" . Consider the North-South mode now. To traverse a diamond 
in this mode costs exactly 5 (5 edges with cost 1 and 2 edges with cost 
0), and so, to traverse all diamonds costs exactly 5 k. The edges {N^,Sj} 
between the diamonds have cost 0 for i / 1. Since the Hamiltonian cycle 
must use an edge connecting Ni with some S j or N j with j G {2, 3, . . . , 
one edge of cost M must be used. Thus, the cost of any Hamiltonian path 
of G" in the North-South mode is at least M -f 5 k, and 

Ni, S 2 , . . . , N 2 , S 3 , . . . , N 3 , S 4 , . . . , Sfc, . . . , Nfc, Si, . . . , N x 

is a Hamiltonian cycle of cost M + 5k. Since any Hamiltonian cycle of 
Kgk that is not a Hamiltonian cycle of G" must contain at least one edge 
of cost 2 M > M + 5/c; the cost of the second-best solution for the input 
instance ( Kgk,CM ) is M + 5 k. 

(iii) We count the number of Hamiltonian cycles of G" that traverse all di- 
amonds in the North- South direction and that contain exactly one edge 
of the cost M (i.e., cycles with the cost M + bk). Since the vertices of 
N-S — {Ni} build a clique of edges of cost 0 in G", we have (k — 1)! 
orders in which the diamonds J?i, D 2 , . . . , Dk can be traversed. Addition- 
ally, for each of the k diamonds one can choose one of the two directions 
of the North-South mode (from N* to Si or from Si to Ni). Thus, there 
are altogether 

2 k ~ 1 (k- 1)! 

distinct Hamiltonian cycles of cost M + bk. 




208 



3 Deterministic Approaches 



(iv) Finally, observe that the optimal solution He-w and any second-best 
Hamiltonian cycle have exactly 5 k edges in common - the inner edges 
of the diamonds of cost 1. Thus, they differ in exactly 3 k edges. Since 
the only better solution than a second-best solution is the optimal solu- 
tion, He-w , every second-best solution is a local optimum with respect 
to any neighborhood Neigh where the optimal solution is not in Neigh{a) 
for any second-best solution a. Since (3 k — l)-Exchange(a) does not con- 
tain He-w for any second-best solution a, all second-best solutions of 
(Kgk, cm) are local optima with respect to (3k — 1)- Exchange. □ 

Keywords introduced in Section 3.6 

neighborhood on sets of feasible solutions, local transformation, local optimum 
with respect to a neighborhood, local search scheme, k-Exchange neighbor- 
hood for TSP, Lin-Kernighan’s variable-depth neighborhood, exact neighborhood, 
polynomial-time searchable neighborhood, cost-bounded integer valued problem, 
suboptimality decision problem, restricted Hamiltonian cycle problem 

Summary of Section 3.6 

Local search is an algorithm design technique for optimization problems. The first 
step of this technique is to define a neighborhood on the sets of feasible solutions. 
Usually, neighborhoods are defined by so-called local transformations in such a 
way that two feasible solutions a and (3 are neighbors if one can obtain a from 
/3 (and vice versa) by some local change of the specification of (3 (a). Having a 
neighborhood, local search starts with an initial feasible solution and then iteratively 
tries to find a better solution by searching the neighborhood of the current solution. 
The local algorithm halts with a solution a whose neighborhood does not contain 
any feasible solution better than a. Thus, a is a local optimum according to the 
neighborhood considered. 

Any local search algorithm finishes with a local optimum. In general, there is 
no guarantee that the costs of local optima are not very far from the cost of an 
optimal solution and that local search algorithms work in polynomial time. For 
TSP there exist input instances with a unique optimal solution and exponentially 
many very poor second-best solutions that are all local optima according to the 
(n/ 3)- Exchange neighborhood. 

The following tradeoff exists between the time complexity and the quality of 
local optima. Small neighborhoods usually mean an efficient execution of one im- 
provement but they often increase the number of local optima (i.e., the probability 
of getting stuck in a very weak local optimum). On the other hand, large neigh- 
borhoods can be expected to yield feasible solutions of higher quality, but the time 
complexity to verify local optimality becomes too large to be profitable. A way to 
overcome this difficulty may be the Kernighan-Lin’s variable-depth search, which 
is a compromise between the choice of a small neighborhood and the choice of 
a large neighborhood. Very roughly, one uses a greedy approach to search for an 




3.7 Relaxation to Linear Programming 209 



improvement in a large neighborhood instead of an exhaustive search of a small 
neighborhood. 

A neighborhood is called exact if every local optimum with respect to this 
neighborhood is a total optimum, too. A neighborhood is called polynomial-time 
searchable, if there exists a polynomial-time algorithm that finds the best solution in 
the neighborhood of every feasible solution. The existence of an exact polynomial- 
time neighborhood for an integer- valued optimization problem U usually implies that 
the local search algorithm according to this neighborhood is a pseudo-polynomial 
time algorithm for U. 

If an optimization problem is strongly NP-hard, then it does not possess any ex- 
act, polynomial-time searchable neighborhood. TSP, A-TSP, and Weight-VCP 
are examples of such problems. 

TSP has pathological input instances for the local search with the large 
n/3-Exchange neighborhood. These pathological instances have a unique optimal 
solution and exponentially many second-best local optima whose costs are expo- 
nential in the optimal cost. 



3.7 Relaxation to Linear Programming 

3.7.1 Basic Concept 

Linear Programming (LP) is an optimization problem that can be solved in 
polynomial time, while 0/1-linear programming (0/1-LP) and integer linear 
programming (IP) are NP-hard. All these optimization problems have the 
common constraints A • X = b for a matrix A and a vector b and the same 
objective to minimize X • c T for a given vector c. The only difference is that 
one minimizes over reals in LP, while the feasible solutions are over {0, 1} 
for 0/1-LP and over integers for IP. This difference is essential because it 
determines the hardness of these problems. 

Another important point is that many hard problems can be easily re- 
duced to 0/1-LP or to IP; it is even better to say, many hard problems are 
naturally expressible in the form of linear programming problems. Thus, linear 
programming problems become the paradigmatic problems of combinatorial 
optimization and operations research. 

Since LP is polynomial-time solvable, and 0/1-LP and IP are NP-hard, a 
very natural idea is to solve problem instances of 0/1-LP and IP as program 
instances of the efficiently solvable LP. This approach is called relaxation 
because one relaxes the requirement of finding an optimal solution over {0, 1} 
or over positive integers by searching for an optimal solution over reals. Obvi- 
ously, the computed optimal solution a for LP does not need to be a feasible 
solution for 0/1-LP or IP. Thus, one can ask what is the gain of this approach. 
First of all, cost (a) is a lower bound 25 on the cost of the optimal solutions 

25 Remember that we consider minimization problems. For maximization problems 
cost (a) is an upper bound on the optimal cost. 




210 3 Deterministic Approaches 



with respect to 0/1-LP and IP. Thus, cost (a) can be a good approximation of 
the cost of the optimal solutions of the original problem. This combined with 
the prima-dual method in Section 3.7.4 can be very helpful as a precomputa- 
tion for a successful application of the branch-and-bound method as described 
in Section 3.4. Another possibility is to use a to compute a solution (3 that 
is feasible for 0/1-LP or IP. This can be done, for instance, by (randomized) 
rounding of real values to the values 0 and 1 or to positive integers. For some 
problems, such a feasible solution (3 is of high quality in the sense that it 
is reasonably close to an optimal solution. Some examples of such successful 
applications are given in Chapters 4 and 5. 

Summarizing the considerations above, the algorithm design technique of 
relaxation to linear programming consists of the following three steps: 

(1) Reduction 

Express a given instance x of an optimization problem U as an input 
instance I(x) of 0/1-LP or IP. 

(2) Relaxation 

Consider I (x) as an instance of LP and compute an optimal solution a to 
I(x) by an algorithm for linear programming. 

(3) Solving the original problem 

Use a to either compute an optimal solution for the original problem, or 
to find a high-quality feasible solution for the original problem. 

While the first two steps of the above schema can be performed in polyno- 
mial time, there is no guarantee that the third step can be executed efficiently. 
If the original optimization problem is NP-hard, the task to find an optimal 
solution in Step 3 is NP-hard, too. Anyway, this approach is very practical be- 
cause for many problems the bound given by the value cost (a) helps to speed 
up the branch-and-bound method. This approach may be especially successful 
if one relaxes the requirement to find an optimal solution to the requirement 
to compute a reasonably good solution. For several problems one even can 
give some guarantee on the quality of the output of this scheme. Examples of 
this kind are presented in the next two chapters. 

Section 3.7 is organized as follows. In Section 3.7.2 we illustrate the first 
step of the above scheme by presenting a few “natural” reductions to 0/1-LP 
and to IP. Natural means that there is a one-to-one correspondence between 
the feasible solutions of the original optimization problem and the result- 
ing 0/1-linear programming (integer programming) problem. If we say that 
IP expresses another optimization problem [7, then this even means that 
the formal representation of solutions in U and IP are exactly the same 26 
(Mu(x) = M\p(y) for every pair of corresponding input instances x and y) 
and two corresponding solutions of IP and U specified by the same represen- 
tation a also have the same cost. In order to give at least an outline of how to 

26 For instance, vectors from {0, l} n 




3.7 Relaxation to Linear Programming 211 



solve the linear programming problems, Section 3.7.3 gives a short description 
of the simplex method that is in fact a simple local search algorithm. 



3.7.2 Expressing Problems as Linear Programming Problems 

In Section 2.3.2 we introduced the linear programming problem as to minimize 

n 

c T -X = J2 CiXi 
i= 1 



under the constraints 

n 

A - X — 6, i.e., djiXi = bj for j = 1, . . . , m. 

i— 1 

Xi > 0 for i = 1, . . . , n (i.e., X £ (IR-°) n ) 

for every input instance A = [aji]j=i,...,m,i=i,...,n, b = (6i , . . . , 6 m ) T , and c = 
(ci, . . . , c n ) T over reals. In what follows we call this form of LP the standard 
(equality) form. Remember that the integer linear programming problem 
is defined in the same way except that all coefficients are integers and the 
solution X e Z5 n . 

There are several forms of LP. First of all we show that all these forms can 
be expressed in the standard form. The canonical form of LP is to minimize 

n 

c r • x = Ci • Xi 

2—1 



under the constraints 



n 

AX > 6, i.e., ^2 a ji x i > bj for j — 1, . . . , m 
2=1 

Xi > 0 for i = 1, . . . , n 
for every input instance (A,b,c). 

Now we transform the canonical form to the standard (equality) form. This 
means we have to replace the inequality constraint AX > b by some equality 
constraint B • Y — d. For every inequality constraint 

n 

^ ^ CLjiXi ^ bj 
2=1 

we add a new variable Sj (called surplus variable) and require 

n 

dj %Xi Sj — bj , Sj ^ 0 . 

2=1 




212 3 Deterministic Approaches 



Thus, if A = b = (&i, . . . , b m ) T , and c = (ci,...,c„) T is 

the problem instance in the canonical form with the set of feasible solutions 
{X G (1R- 0 )” I AX > &}, then the corresponding instance is expressed in the 
standard equality form (L?,6, d), where 



B 



( ^11 «12 ••• «ln — 1 0 ... 0 \ 

&21 &22 • • • & 2 n 0 — 1 . . . 0 

Va m l CLrn2 • • • ^mn 0 0 • • • 1 J 



d= (ci,...,c n ,0, ...,0) T GlR' 



m+n 



The set of solutions for (L?, 6, d) is 



Sol(B, b,d) = {Y = (x u ...,x n , Sl ,..., s m ) T G (IR-°) m+ " | B ■ Y = b}. 

We observe that 

(i) for every feasible solution a = (oq, . . . , a n ) T to (A, 6, c) in the canonical 
form of LP, there exists (/?i, . . . , /? m ) T G (JR-°) m such that 

(«i, . . . ,a„,/3i . . . ,/3 ™) t € Sol(B,b,d), 



and vice versa, 

(ii) for every feasible solution (<Ji, . . . , <5 n , 71, . . . , 7 m ) T £ Sol(B , 6, d), the vec- 
tor (<$1, . . . , d n ) T is a feasible solution to the input instance (A, 6, c ) in the 
canonical form of LP. 

Since d — (ci, . . . , c n , 0, . . . , 0) T , the minimization in Sol(B , 6, d) is equiv- 
alent to the minimization of dXi in {X G (H- 0 )" | AX > 6}. We see 

that the size of (L?, 6, d) is linear in the size of (A, 6, c) and so the above 
transformation can be realized efficiently. 

Next, we consider the standard inequality form of LP. One has to 
minimize 

n 

C T • X = ^2 °i X i 
i = 1 

under the constraints 

n 

AX < 6, i.e., a ji x i for j = 1, . . . , m, and 
2=1 

Xi > 0 for i = 1, . . . , n 

for a given instance A, 6, c. To transform Yll Li to an equation we 

introduce a new variable Sj (called slack variable) and set 

n 

+ Sj = and Sj >0. 

2=1 




3.7 Relaxation to Linear Programming 213 



Analogously to the transformation from the canonical form to the standard 
form, it is easy to observe that the derived standard equality form is equivalent 
to the given standard form. 

Exercise 3. 7. 2.1. Consider the general form of LP defined as follows. For 
an input instance 

A = 2=l,...,n j b ~ (^1 5 • • • 5 ^m) 5 ^ = (^1 5 • • • > ^n) 5 

MC {1, . . . , m}, Q C {1, . . . ,n}, 

one has to minimize 

n 

y ^cjXj 

i= 1 

under the constraints 

n 

= bj for j G M 

2=1 

n 

^ > b r for r G {1, . . . , m} — M 

2=1 

> 0 for i G Q. 

Reduce this general form to 

(i) the standard form, 

(ii) the standard inequality form, and 

(iii) the canonical form. 

□ 

The reason to consider all these forms of LP is to simplify the transfor- 
mation of combinatorial problems to LP. Thus, one can choose the form that 
naturally expresses the original optimization problem. Since all forms of LP 
introduced above are minimization problems, one could ask how to express 
maximization problems as linear programming problems. The answer is very 
simple. Consider, for an input A , b , c, the task 

n 

maximize c T • X = CiXi 
2=1 

under some linear constraints given by A and b. Then this task is equivalent 
to the task 

n 

minimize [(—1) • c T ] • X = y^(— a) • Xi 

2=1 

under the same constraints. 

In what follows we express some optimization problems as LP problems. 




214 3 Deterministic Approaches 



MINIMUM WEIGHTED VERTEX COVER. 

Remember that the input instances of Weight- VCP are weighted graphs 
G = (V, E,c), c : V — ► IN — {0}. The goal is to find a vertex cover S with a 
minimal cost ^2 ve s c ( v )- Let V = {tr, . . . , v n }. 

To express this instance of the Weight- VCP problem as an instance of LP 
(in fact of 0/1-LP), we represent the sets S C V by its characteristic vectors 
X s = (zi, . . . ,x n ) G {0, l} n , where 



Xi = 1 iff Vi G S. 



The constraint to cover all edges from E can be expressed by 
Xi + Xj > 1 for every {v^ Vj} G E 

because for every edge {vi,Vj} one of the incident vertices must be in the 
vertex cover. The goal is to minimize 

n 

• Xi . 

i= 1 

Relaxing X{ G {0, 1} to Xi > 0 for % — 1, . . . , n we obtain the canonical form 
of LP. 

KNAPSACK PROBLEM. 

Let W 2 , . . • , w n , ci, C 2 , • • • , c n , and b be an instance of the knapsack prob- 

lem. We consider n Boolean variables #i, # 2 , • • • , x n where Xi = 1 indicates 
that the ith object was packed in the knapsack. Thus, the task is to maximize 

n 

Y.CiXi 

i= 1 

under the constraints 

n 

WiXi < 6, and 

2— 1 

Xi G {0, 1} for i = 1, . . . , n. 

Exchanging the maximization of Y17=i CiXi ^ or m i n i m ization of 



n 

E(- c ^) ■ Xi 
2=1 

one obtains a standard input instance of 0/1-LP. Relaxing Xj G {0,1} to 
Xi > 0 for i = 1, . . . , n results in the standard inequality form of LP. 




3.7 Relaxation to Linear Programming 215 



MAXIMUM MATCHING PROBLEM. 

The maximum matching problem is to find a matching of maximal cardinality 
in a given graph G — ( V , E). Remember that a matching in G is a set of edges 
H C E with the property that, for all edges {u, u}, {x, y} £ H, {u, u} ^ {x, y } 
implies |{u, u, x, ?/}| = 4 (i.e., no two edges of a matching share a common 
vertex) . 

To express the instances of this problem as instances of 0/1-LP we consider 
Boolean variables x e for every e £ E, where 

x e = 1 iff e £ H. 

Let, for every v £ V, E(v) = {{u,u} £ E | u £ V} be the set of all edges 
incident to v. Now, the task is to maximize 

J2 Xe 

e(zE 

under the \V\ constraints 

x e < 1 for every v £ V, 

eGE(v) 

and the following \E\ constraints 

x e £ {0, 1} for every e £ E. 

Relaxing x e £ {0, 1} to x e > 0 one obtains an instance of LP. 

Exercise 3. 7. 2. 2. Express the instances of the maximum matching problem 
as instances of LP in the canonical form. □ 

Exercise 3. 7. 2. 3. Consider the maximum matching problem for bipartite 

graphs only. Prove, that every optimal solution of its relaxation to LP is also a 
feasible (Boolean) solution to the original instance of the maximum matching 
problem (i.e., that this problem is in P). □ 

Exercise 3. 7.2. 4. Consider the following generalization of the maximum 
matching problem. Given a weighted graph (G, c), G — (V, E), c : E — > IN, 
find a perfect matching with the minimal cost, where the cost of a matching 
H is cost(H) = J2eeH c ( e )- 

A perfect matching is a matching if, where every edge e from E is either 
in H or shares one common vertex with an edge in H. Express the input 
instances of this minimization problem as input instances of 0/1-LP. □ 

Exercise 3. 7. 2. 5. Express the minimum spanning tree problem as a 0/1-LP 
problem. □ 

Exercise 3. 7.2. 6. A cycle cover of a graph G = (V, E) is any subgraph G = 
(V, Ec) of G, where every vertex has the degree exactly two. The minimum 
cycle cover problem is to find, for every weighted complete graph (G, c), a 
cycle cover of G with the minimal cost with respect to c. Relax this problem 
to the standard form of LP. □ 




216 3 Deterministic Approaches 



MAKESPAN SCHEDULING PROBLEM. 

Let (pi i P 2 ? • • • ,p n ,^) be an instance of MS. Remember that pi € IN — {0} 
is the processing time of the i-th job on any of the m identical machines for 
i = 1, ... ,n. The task is to distribute the n jobs to m machines in such a 
way that the whole processing time (makespan - the time after which every 
machine processed all its assigned jobs) is minimized. 

We consider the Boolean variables Xij G {0, 1} for i = l,...,n, j = 
1, . . . , m with the following meaning: 

= 1 iff the i-th job was assigned to the j - th machine. 

The n linear equalities 

m 

x^ = 1 for alH G {1, . . . , n} 

3 = 1 

guarantee that each job is assigned to exactly one machine and so each job 
will be processed exactly once. Now, we take an integral variable t for the 
makespan. Thus, the objective is to 

minimize t 

under the n linear equalities above and the constraints 

n 

t — pi • x^ < 0 for all j G {1, . . . , m} 

i=i 

that assures that every machine finishes the work on its job in time at most t. 

There is also a possibility to express (pi,p 2 , • • • m) as a problem in- 
stance of 0/1 — LP, that looks more natural than the use of the integer variable 
t. Since the m machines are identical, we may assume without loss of generality 
that the first machine has always the maximal load. Then we minimize 

n 

^2 n ■ £ii> 

i= 1 

which is the makespan of the first machine. The constraints 

m 

x^ > 1 for alH G {1, ... , n} 

3 = 1 

guarantee that each job is assigned to at least one machine, and the constraints 

n n 

5 ~^Pi ■ Xij <J2Pi- x n for a11 J e {2, . . . , m} 

i= 1 i=l 

ensure that the makespan of the first machine is at least as large as the 
makespan of any other machine. 




3.7 Relaxation to Linear Programming 217 



MAXIMUM SATISFIABILITY PROBLEM. 

Let F = Fi A F 2 A . . . A F m be a formula in CNF over the set of variables 
X = {xi, X2 , . . . , x n }. We use the same boolean variables x\, X2 , . . . , x n for the 
relaxation and take the additional variables zi, Z 2 , . . . , z m with the meaning 

Zi = 1 iff the clause Fi is satisfied. 

Let, for each i E {1, . . . , ra}, In + (Fi) be the set of indices of the variables from 
X which appear as positive literals in Fi and let In - (Fi) be the set of the 
indices of the negated variables that appear as literals in Fi. For instance, for 
the clause C — x\ VX3 VX7 Vxg we have In + (C) = {1,7} and In' -(C) = {3,9}. 
Now, the instance LP (F) of IP that expresses F is to 

m 

maximize Zj 
3 = 1 

subject to the following 2 m + n constraints 

Zj- X Xi ~ X (1 — x i) — 0 for j = 1, . . . ,m 

ieln+(F 3 ) leln-fFj) 

Xi E {0, 1} for i = 1 , . . . , n 
Zj E {0, 1} for j = 1, . . . , m. 

The linear inequality 

Zj < Xi + yy (1 ~ x i) 

ieIn+(F J ) leln-(Fj) 

assures that Zj may take the value 1 only if at leaist one of the literals in F 
takes the value 1. Relaxing Xi E {0, 1} to 

Xi > 0 and Xi < 1 

for i = 1, . . . , n and relaxing Zj E {0, 1} to 

Zj > 0 

for j = 1 , . . . , m results in an irregular form of LP. Using additional variables 
one can obtain any of its normal forms. 

SET COVER PROBLEM. 

Let (X, T) with X — {ai,...a n } and T — {Si, S2, • • • , Sm}, Si C X for 
i = 1 , . . . , ra, be an instance of the set cover problem (SCP). For i = 1 , . . . , m 
we consider the Boolean variable Xi with the meaning 

Xi = 1 iff Si is picked for the set cover. 




218 3 Deterministic Approaches 

Let Ind ex(fc) = {d G {1, . . . , m } | a,k G Sd} for k — 1, . . . , n. Then we have to 

m 

minimize Xj 

i= 1 



under the following n linear constraints 

Xj > 1 for k = 1, . . . , n. 

jE Index ( k ) 

Exercise 3. 7. 2. 7. Consider the set multicover problem as the following min- 
imization problem. For a given instance (X, T, r ) where (X, J 7 ) is an instance 
of SCP and r G IN — {0}, a feasible solution to (X, J 7 , r ) is a r-multicover of X 
that covers each element of X at least r times. In a multicover it is allowed to 
use the same set S G T several times. Find a relaxation of the set multicover 
problem to LP. □ 

Exercise 3. 7. 2. 8. Consider the hitting set problem (HSP) defined as fol- 
lows. 

Input: (X,S,c), where X is a finite set, S C Pot(X), and c is a function 

from X to IN — {0}. 

Constraints: For every input (X,S,c) f 

M(X,S,c) = {Y C X | Y H S ^ 0 for every S G S}. 

Costs: For every Y G A 4(X,S,c), cost(Y , (X,S,c)) = J2 x ey c ( x )- 

Goal: minimum. 

Show that 

(i) the hitting set problem (HSP) can be relaxed to a LP problem, and 
(ii^*) the set cover problem can be polynomial-time reduced to HSP. 

□ 



3.7.3 The Simplex Algorithm 

There is a large amount of literature on methods for solving the linear pro- 
gramming problem and there are a couple of thick books devoted to the study 
of different versions of the simplex method. We do not want to go in detail 
here. The aim of this section is only to roughly present the Simplex al- 
gorithm in a way that enables us to see it as a local search algorithm in 
some geometrical interpretation. The reason for restricting our attention to 
solving the linear programming problem in the above sense is that this is a 
topic of operations research, and the main interest from the combinatorial op- 
timization point of view focuses on the possibility to express (combinatorial) 




3.7 Relaxation to Linear Programming 219 

optimization problems as problems of linear programming 27 (and so to use 
algorithms of operations research as a subpart of algorithms for solving the 
original combinatorial optimization problems). 

Consider the standard form of the linear programming problem that is, 
for given A — b — (p i ? • • • i bm') 5 and c (ci , . . . , c n ) 

n 

to minimize CiXi 

i= 1 



under the constraints 

n 



ajiXi = bj for j = 1,. . 


. , m, and 


x i > 0 for i = 1 , . 


. .,n. 



Recall Section 2.2.1, where we showed that the set Sol (A) of solutions of a 
system of homogeneous linear equations AX = 0 is a subspace of IR n and that 
the set Sol (A, b) of solutions of a system AX = b of linear equations is an affine 
subspace of IR n . A subspace of IR n always contains the origin (0, 0, ... , 0) T and 
any affine subspace can be viewed as a subspace that is shifted (translated) by 
a vector from IR n . So, the dimension of Sol (A , b) is that of Sol (A). Remember 
that the dimension of Sol(A,b ) for one linear equation (m = 1) is exactly 
n — 1. In general, dim (Sol(A, b)) — n — rank(A). 

Definition 3. 7.3.1. Let n be a positive integer. An affine subspace ofJR n of 
dimension n — 1 is called a hyperplane. Alternatively , a hyperplane of IR n 
is a set of all X = (aq, . . . , x n ) T G IR n such that 



a\X\ + a2X2 + . . . + a n x n — b 

for some ai, a2? • • • > a n, b, where not all a ’s are equal to zero. The sets 



HS>(ai, . . . ,a n ,6) = = (aq, . . . ,x n ) T G JR n 

HS<(ai, . . . , a n , b) = [x = (®i, . . . ,x n ) T e H" 



aiXi > b > , and 

i= 1 J 

" 1 

y, OiXi < b > 



2=1 



are called halfspaces. 

Obviously, any halfspace of ]R n has the same dimension n as IR n , and it 
is a convex set. Since any finite intersection of convex sets is a convex set (see 
Section 2.2.1), the set 



27 Recently, even as problems of semidefinite programming or other generalizations 
of linear programming that can be solved in polynomial time. 




220 3 Deterministic Approaches 



{X G ]R n I A ■ X < b) = f| HS<(a i i, . . . , a jn , bj) 

3 = 1 

m n 

= {X = (x\ j • • • ? *^n) ^ IR | ^ ^ &jiXi < bj } , 

J=1 i=l 

called Polytope(AX < 6), is a convex set. 

Definition 3. 7.3. 2. Le£ n be a positive integer. The intersection of a finite 
number of half spaces of IR n is a (convex) polytope of IR n . For given con- 
straints Y^j= i a ji x i < bj, for j = 1 . . . , m, and Xi > 0 for i = 1 , . . . , n, 

Poly tope (AX < 6, X > 0 nX i) = {X e (IR- 0 ) n | A • X < b}. 

Observe that 



Polytope(AX <b,X > 0 nX i) = 



m / n 

f]us<( ajl , . . . , aj n , bj ) n ( n^* 1 ’ • • • > xn ) T e ^ 

3 = 1 \3 = 1 

is the intersection of m + n halfspaces. 

Consider the following constraints in IR 2 . 



x j > 0 



x\ + x 2 < 8 
x 2 < 6 
xi — x 2 < 4 

#i >0 

x 2 > 0. 



The polytope depicted in Figure 3.19 is exactly the polytope that cor- 
responds to these constraints. Consider minimizing x\ — x 2 and maximizing 
x\ — x 2 for these constraints. Geometrically, one can solve this problem as 
follows. One takes the straight line x\ — x 2 = 0 (see Figure 3.20) and moves it 
via the polytope in both directions perpendicular to this straight line. In the 
“upwards” direction (i.e., in the direction of the halfplane X\ — x 2 < 0) this 
straight line leaves the polytope at the point (0, 6) and this (x\ = 0, x 2 = 6) is 
the unique optimal solution of this minimization problem. In the “downwards” 
direction (i.e., in the direction of the halfplane x\ — x 2 > 0) this straight line 
leaves the polytope in the set of points corresponding to the points of the line 
connecting the points (4,0) and (6,2). Thus, 

{(xi,x 2 ) t G IR 2 I (4,0) < (xi,x 2 ) < (6, 2), x\ - x 2 = 4} 
is the infinite set of all optimal solutions of the maximization problem. 




222 3 Deterministic Approaches 



0 and move it in both perpendicular directions to this straight line as depicted 
in Figure 3.21. Moving in the direction of the halfplane 3xi — 2^2 < 0, this 
straight line leaves the polytope at the point (0,6), and so X\ — 0, x 2 = 6 
is the unique optimal solution of the minimization problem. Moving in the 
direction of the halfplane 3xi — 2 x 2 > 0, this straight line leaves the poly tope 
at the point (6, 2), and so x\ — 6, x 2 = 2 is the unique optimal solution of the 
maximization problem. 




Fig. 3.21. 



To support our visual intuition concerning the linear programming prob- 
lem we consider one more example in IR 3 . Figure 3.22 involves the polytope 
that corresponds to the following constraints: 

xi + x 2 + x 3 < 8 

x\ < 4 

3x 2 + x 3 < 12 
x 3 < 6 

xi >0 

x 2 >0 

x 3 > 0 

Consider the tasks minimizing 2xi H-x 2 +x 3 and maximizing 2xi +x 2 +x 3 . 
Take the hyperplane 2xi + x 2 + x 3 = 0. To minimize 2xi -f x 2 -F x 3 move 




